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ETAPS Foreword 


Welcome to the 27th ETAPS! ETAPS 2024 took place in Luxembourg City, the 
beautiful capital of Luxembourg. 

ETAPS 2024 is the 27th instance of the European Joint Conferences on Theory and 
Practice of Software. ETAPS is an annual federated conference established in 1998, 
and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each con- 
ference has its own Program Committee (PC) and its own Steering Committee (SC). 
The conferences cover various aspects of software systems, ranging from theoretical 
computer science to foundations of programming languages, analysis tools, and formal 
approaches to software engineering. Organising these conferences in a coherent, highly 
synchronized conference programme enables researchers to participate in an exciting 
event, having the possibility to meet many colleagues working in different directions in 
the field, and to easily attend talks of different conferences. On the weekend before the 
main conference, numerous satellite workshops took place that attracted many 
researchers from all over the globe. 

ETAPS 2024 received 352 submissions in total, 117 of which were accepted, 
yielding an overall acceptance rate of 33%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con- 
tributions, and in particular the PC (co-)chairs for their hard work in running this entire 
intensive process. Last but not least, my congratulations to all authors of the accepted 
papers! 

ETAPS 2024 featured the unifying invited speakers Sandrine Blazy (University of 
Rennes, France) and Lars Birkedal (Aarhus University, Denmark), and the invited 
speakers Ruzica Piskac (Yale University, USA) for TACAS and Jéróme Leroux 
(Laboratoire Bordelais de Recherche en Informatique, France) for FoSSaCS. Invited 
tutorials were provided by Tamar Sharon (Radboud University, the Netherlands) on 
computer ethics and David Monniaux (Verimag, France) on abstract interpretation. 

As part of the programme we had the first ETAPS industry day. The goal of this day 
was to bring industrial practitioners into the heart of the research community and to 
catalyze the interaction between industry and academia. The day was organized by 
Nikolai Kosmatov (Thales Research and Technology, France) and Andrzej Wasowski 
(IT University of Copenhagen, Denmark). 

ETAPS 2024 was organized by the SnT - Interdisciplinary Centre for Security, 
Reliability and Trust, University of Luxembourg. The University of Luxembourg was 
founded in 2003. The university is one of the best and most international young 
universities with 6,000 students from 130 countries and 1,500 academics from all over 
the globe. The local organisation team consisted of Peter Y.A. Ryan (general chair), 
Peter B. Roenne (organisation chair), Maxime Cordy and Renzo Gaston Degiovanni 
(workshop chairs), Magali Martin and Isana Nascimento (event manager), Marjan 
Skrobot (publicity chair), and Afonso Arriaga (local proceedings chair). This team also 
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organised the online edition of ETAPS 2021, and now we are happy that they agreed to 
also organise a physical edition of ETAPS. 

ETAPS 2024 is further supported by the following associations and societies: 
ETAPS e.V., EATCS (European Association for Theoretical Computer Science), 
EAPLS (European Association for Programming Languages and Systems), and EASST 
(European Association of Software Science and Technology). 

The ETAPS Steering Committee consists of an Executive Board, and representa- 
tives of the individual ETAPS conferences, as well as representatives of EATCS, 
EAPLS, and EASST. The Executive Board consists of Marieke Huisman (Twente, 
chair), Andrzej Wasowski (Copenhagen), Thomas Noll (Aachen), Jan Kofroi (Prague), 
Barbara Kónig (Duisburg), Arnd Hartmanns (Twente), Caterina Urban (Inria), Jan 
Křetínský (Munich), Elizabeth Polgreen (Edinburgh), and Lenore Zuck (Chicago). 

Other members of the steering committee are: Maurice ter Beek (Pisa), Dirk Beyer 
(Munich), Artur Boronat (Leicester), Luís Caires (Lisboa), Ana Cavalcanti (York), 
Ferruccio Damiani (Torino), Bernd Finkbeiner (Saarland), Gordon Fraser (Passau), 
Arie Gurfinkel (Waterloo), Reiner Háhnle (Darmstadt), Reiko Heckel (Leicester), 
Marijn Heule (Pittsburgh), Joost-Pieter Katoen (Aachen and Twente), Delia Kesner 
(Paris), Naoki Kobayashi (Tokyo), Fabrice Kordon (Paris), Laura Kovács (Vienna), 
Mark Lawford (Hamilton), Tiziana Margaria (Limerick), Claudio Menghi (Hamilton 
and Bergamo), Andrzej Murawski (Oxford), Laure Petrucci (Paris), Peter Y.A. Ryan 
(Luxembourg), Don Sannella (Edinburgh), Viktor Vafeiadis (Kaiserslautern), Stepha- 
nie Weirich (Pennsylvania), Anton Wijs (Eindhoven), and James Worrell (Oxford). 

I would like to take this opportunity to thank all authors, keynote speakers, atten- 
dees, organizers of the satellite workshops, and Springer Nature for their support. 
ETAPS 2024 was also generously supported by a RESCOM grant from the Luxem- 
bourg National Research Foundation (project 18015543). I hope you all enjoyed 
ETAPS 2024. 

Finally, a big thanks to both Peters, Magali and Isana and their local organization 
team for all their enormous efforts to make ETAPS a fantastic event. 


April 2024 Marieke Huisman 
ETAPS SC Chair 
ETAPS e.V. President 


Preface 


This three-volume proceedings contains the papers presented at the 30th International 
Conference on Tools and Algorithms for the Construction and Analysis of Systems 
(TACAS 2024). TACAS 2024 was part of the 27th European Joint Conferences on 
Theory and Practice of Software (ETAPS 2024), which was held between April 6-11, 
2024, in Luxembourg City, Luxembourg. 

TACAS is a forum for researchers, developers and users interested in rigorous tools 
and algorithms for the construction and analysis of systems. The conference aims to 
bridge the gaps between different communities with this common interest and to 
support them in their quest to improve the utility, reliability, flexibility, and efficiency 
of tools and algorithms for building systems. TACAS 2024 interleaves and integrates 
various disciplines, including formal verification of software and hardware systems, 
static analysis, probabilistic programming, program synthesis, concurrency, testing, 
simulations, verification of machine learning/autonomous systems, Cyber-Physical 
Systems, SAT/SMT solving, automated and interactive theorem proving, and proof 
checking. 

There were four submission categories for TACAS 2024: 


— 


. Regular research papers identifying and justifying a principled advance to the 

theoretical foundations for the construction and analysis of systems. 

2. Case study papers describing the application of techniques developed by the 
community to a single problem or a set of problems of practical importance, 
preferably in a real-world setting. 

3. Regular tool papers presenting a novel tool or a new version of an existing tool 
built using novel algorithmic and engineering techniques. 

4. Tool demonstration papers demonstrating a new tool or application of an existing 

tool on a significant case-study. 


Regular research, case study, and regular tool paper submissions were restricted to 
16 pages, whereas tool demonstration papers to 6 pages, excluding the bibliography 
and appendices. 

TACAS 2024 received 159 submissions, consisting of 114 regular research papers, 
10 case study papers, 28 regular tool papers, and 7 tool demonstration papers. Each 
submission was assigned for review to at least three Program Committee (PC) mem- 
bers, who made use of subreviewers. Regular research papers were reviewed in double- 
blind mode, whereas case study, regular tool, and tool-demonstration papers were 
reviewed using a single-blind reviewing process. 

Similarly to previous years, it was possible to submit an artifact alongside a paper. 
Artifact submission was mandatory for regular tool and tool demo papers, and vol- 
untary for regular research and case study papers at TACAS 2024. An artifact might 
consist of a tool, models, proofs, or other data required for validation of the results 
of the paper. The Artifact Evaluation Committee (AEC) was tasked with reviewing the 


viii Preface 


artifacts, based on their documentation, ease of use, and, most importantly, whether the 
results presented in the corresponding paper could be accurately reproduced. Most 
of the evaluation was carried out using a standardized virtual machine to ensure 
consistency of the results, except for those artifacts that had special hardware or 
software requirements. Artifact evaluation at TACAS 2024 consisted of two rounds. 
The first round implemented the mandatory artifact evaluation of regular tool and tool 
demonstration papers; this round was carried out in parallel with the work of the PC. 
The judgment of the AEC was communicated to the PC and weighed in their dis- 
cussion. The second round of artifact evaluation carried out the voluntary artifact 
evaluation of regular research and case study papers, and took place after paper 
acceptance notifications were sent out; authors of accepted regular research and case 
study papers were able to update and revise their respective artifacts before artifact 
evaluation started. In both rounds, the AEC provided 3 reviews per artifact and 
anonymously communicated with the authors to resolve apparent technical issues. In 
total, 104 artifacts were submitted and the AEC evaluated a total of 62 artifacts 
regarding their availability, functionality, and/or reusability. Papers with an artifact that 
were successfully evaluated include one or more badges on the first page, certifying the 
respective properties. 

Selected papers were requested to provide a rebuttal in case a PC review gave rise to 
questions. Using the review reports and rebuttals, the PC had a thorough discussion on 
each paper. For regular tool and tool demonstration papers, the PC also discussed the 
corresponding artifact, using the AEC recommendations. As a result, the PC decided to 
accept 53 papers, out of which there were 35 regular research papers, 11 regular tool 
papers, 3 case study papers, and 4 tool demonstration papers. This corresponds to an 
overall acceptance rate of 3396. Each accepted paper at TACAS 2024 had either all 
positive reviews and/or a “championing” PC member who argued in favor of accepting 
the paper. All accepted papers at TACAS 2024 had a positive average review score. 

TACAS 2024 also hosted SV-COMP 2024, the 13th International Competition on 
Software Verification. This event to compare tools evaluated 59 software systems for 
automatic verification of C and Java programs and 17 software systems for witness 
validation. The TACAS 2024 proceedings contains a competition report by the SV- 
Comp chair and organizer. From the 46 actively participating teams, the SV-Comp jury 
selected 16 short papers that describe the participating verification and validation 
systems. These 16 short papers are also published in the proceedings and were 
reviewed by a separate program committee (jury); each of these short papers was 
assessed by at least four jury members. Two sessions in the TACAS 2024 program 
were reserved for the presentation of the results: (1) a presentation session with a report 
by the competition chair and summaries by the developer teams of participating tools, 
and (2) an open community meeting in the second session. 

We would like to thank everyone who helped to make TACAS 2024 successful. We 
thank the authors for submitting their papers to TACAS 2024. The PC members and 
additional reviewers did an excellent job in reviewing papers: they provided detailed 
reports and engaged in the PC discussions. We thank the TACAS steering committee, 
and especially its chair, Joost-Pieter Katoen, for his valuable advice. We are grateful to 
the ETAPS steering committee, and in particular its chair, Marieke Huisman, for 
supporting our changes and suggestions on the TACAS 2024 review process and final 
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program. We also acknowledge the invaluable support provided by the EasyChair 
developers. Lastly, we would like to thank the overall organization team of ETAPS 
2024. 
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Abstract. Generating proofs of unsatisfiability is a valuable capability 
of most SAT solvers, and is an active area of research for SMT solvers. 
This paper introduces the first method to efficiently generate proofs of 
unsatisfiability specifically for an important subset of SMT: SAT Mod- 
ulo Monotonic Theories (SMMT), which includes many useful finite- 
domain theories (e.g., bit vectors and many graph-theoretic properties) 
and is used in production at Amazon Web Services. Our method uses 
propositional definitions of the theory predicates, from which it generates 
compact Horn approximations of the definitions, which lead to efficient 
DRAT proofs, leveraging the large investment the SAT community has 
made in DRAT. In experiments on practical SMMT problems, our proof 
generation overhead is minimal (7.41% geometric mean slowdown, 28.8% 
worst-case), and we can generate and check proofs for many problems 
that were previously intractable. 


An extended version of this paper, which includes appendices with proofs and 


additional results, is available at https: // doi. org/ 10. 48550/ arXiv. 2401. 
10'703 


1 Introduction 


'This paper introduces the first method to efficiently generate and check proofs 
of unsatisfiability for SAT Modulo Monotonic Theories (SMMT), an important 
fragment of general SMT. The motivation for this work rests on these premises: 


— Proofs of UNSAT are valuable, for propositional SAT as well as SMT. Ob- 
viously, an independently checkable proof increases trust, which is impor- 
tant because an incorrect UNSAT result can result in certifying correctness 
of an incorrect system. Additionally, proofs are useful for computing ab- 
stractions via interpolation in many application domains including 
model checking [30] and software analysis [29123]. 

© The Author(s) 2024 
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— SMMT is a worthy fragment of SMT as a research target. SMMT [9] is a 


technique for efficiently supporting finite, monotonic theories in SMT solvers. 
E.g., reachability in a graph is monotonic in the sense that adding edges to 
the graph only increases reachability, and an example SMMT query would 
be whether there exists a configuration of edges such that node a can reach 
node b, but node c can't reach node d. (More formal background on SMMT is 
in Sec. [2.2}) The most used SMMT theories are graph reachability and max- 
flow, along with bit-vector addition and comparison. Applications include 
circuit escape routing [IT], CTL synthesis [28], virtual data center alloca- 
tion [12], and cloud network security and debugging [2]8], with the last two 
applications being deployed in production by Amazon Web Services (AWS). 
Indeed, our research was specifically driven by industrial demand. 

DRAT is a desirable proof format. (Here, we include related formats like 
DRUP [27], GRIT [19], and LRAT [18]. DRAT is explained in Sec. pip 
For an independent assurance of correctness, the proof checker is the criti- 
cal, trusted component, and hence must be as trustworthy as possible. For 
(propositional) SAT, the community has coalesced around the DRAT proof 
format [37], for which there exist independent, efficient proof checkers [87], 
mechanically verified proof checkers [88], and even combinations that are 
fast as well as mechanically proven [18]. The ability to emit DRAT proof 
certificates has been required for solvers in the annual SAT Competition 
since 2014. 

Unfortunately, DRAT is propositional, so general SMT solvers need addi- 
tional mechanisms to handle theory reasoning [6]. For example, Z3 out- 
puts natural-deduction-style proofs [31|, which can be reconstructed inside 
the interactive theorem prover Isabelle/HOL [14T15]. Similarly, veriT [I6] 
produces resolution proof traces with theory lemmas, and supports proof 
reconstruction in both Coq [I| and Isabelle [2115]. As a more general ap- 
proach, CVCA [7] produces proofs in the LFSC format [36], which is a meta- 
logic that allows describing theory-specific proof rules for different SMT the- 
ories. Nevertheless, given the virtues of DRAT, SMT solvers have started 
to harness it for the propositional reasoning, e.g., CVC4 supports DRAT 
proofs for bit-blasting of the bit-vector theory, which are then translated 
into LFSC [34], and Otoni et al. [33] propose a DRAT-based proof certifi- 
cate format for propositional reasoning that they extend with theory-specific 
certificates. However, in both cases, the final proof certificate is not purely 
DRAT, and any theory lemmas must be checked by theory-specific certificate 
checkers. 

For typical finite-domain theories, defining theory predicates propositionally 
is relatively straightforward. 'The skills to design and implement theory- 
specific proof systems are specialized and not widely taught. In contrast, if we 
treat a theory predicate as simply a Boolean function, then anyone with ba- 
sic digital design skills can build a circuit to compute the predicate (possibly 
using readily available commercial tools) and then apply the Tseitin trans- 
form to convert the circuit to CNF. (This is known as “bit-blasting”, but we 
will see later that conventional bit-blasting is too inefficient for SMMT.) 
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From a practical, user-level perspective, the contribution of this paper is 
the first efficient proof-generating method for SMMT. Our method scales to 
industrial-size instances and generates pure DRAT proofs. 

From a theoretical perspective, the following contributions underlie our 
method: 


— We introduce the notion of one-sided propositional definitions for refutation 
proof. Having different definitions for a predicate vs. its complement allows 
for more compact and efficient constructions. 

— We show that SMMT theories expressed in Horn theory enable linear-time 
(in the size of the Horn definition) theory lemma checking via reverse unit 
propagation (RUP), and hence DRAT. 

— We propose an on-the-fly transformation that uses hints from the SMMT 
solver to over-approximate any CNF encoding of a monotonic theory pred- 
icate into a linear-size Horn upper-bound, and prove that the Horn upper- 
bound is sufficient for checking theory lemmas in any given proof via RUP. 

— We present efficient, practical propositional definitions for the main mono- 
tonic theories used in practice: bit-vector summation and comparison, and 
reachability and max-flow on symbolic graphs. 


(As an additional minor contribution, we adapt the BackwardCheck procedure 
from DRAT-Trim [27] for use with SMT, and evaluate its effectiveness in our 
proof checker.) 

We implemented our method in the MonoSAT SMMT solver . For evalua- 
tion, we use two sets of benchmarks derived from practical, industrial problems: 
multilayer escape routing [II], and cloud network reachability []F] Our results 
show minimal runtime overhead on the solver (geometric mean slowdown 7.496, 
worst-case 28.8% in our experiments), and we generate and check proofs for 
many problem instances that are otherwise intractable. 


2 Background 


2.1 Propositional SAT and DRAT 


We assume the reader is familiar with standard propositional satisfiability on 
CNF. Some notational conventions in our paper are: we use lowercase letters 
for literals and uppercase letters for clauses (or other sets of literals); for a 
literal x, we denote the variable of z by var(a); we will interchangeably treat an 
assignment either as a mapping of variables to truth values T (true) or L (false), 
or as a set of non-conflicting (i.e., does not contain both x and its complement i) 
literals, with positive (negative) literals for variables assigned T (.L); assignments 
can be total (assigns truth values to every variable) or partial (some variables 
unassigned); and given a formula F and assignment M, we use the vertical bar 
F|y to denote reducing the formula by the assignment, i.e., discarding falsified 


^ Available at https: //github.com/NickF0211/MonoProof 
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literals from clauses and satisfied clauses from the formula. (An empty clause 
denotes L; an empty formula, T.) 

'This paper focuses on proofs of unsatisfiability. In proving a formula F UN- 
SAT, a clause C is redundant if F and F ^ C are equisatisfiable [26]. A proof of 
unsatisfiability is simply a sequence of redundant clauses culminating in L, but 
where the redundancy of each clause can be easily checked. However, checking 
redundancy is coNP-hard. A clause that is implied by F, which we denote by 
F EC, is guaranteed redundant, and we can check implication by checking the 
unsatisfiability of F ^ C, but this is still coNP-complete. Hence, proofs use re- 
stricted proof rules that guarantee redundancy. For example, the first automated 
proofs of UNSAT used resolution to generate implied clauses, until implying -L 
by resolving a literal | with its complement | [20[39]. In practice, however, reso- 
lution proofs grow too large on industrial-scale problems. 

DRAT [37] is a much more compact and efficient system for proving unsatis- 
fiability. It is based on reverse unit propagation (RUP), which we explain here] 
A unit clause is a clause containing one literal. If L is the set of literals appearing 
in the unit clauses of a formula F, the unit clause rule computes F|r, and the 
repeated application of the unit clause rule until a fixpoint is called unit prop- 
agation (aka Boolean constraint propagation). Given a clause C, its negation C 
is a set of unit clauses, and we denote by F F C if F^ C derives a conflict 
through unit propagation. Notice that F F4 C implies F = C, but is computa- 
tionally easy to check. The key insight behind RUP is that modern CDCL 
SAT solvers make progress by deriving learned clauses, whose redundancy is, 
by construction, checkable via unit propagation. Proof generation, therefore, is 
essentially just logging the sequence of learned clauses leading to L, and proof 
checking is efficiently checking F4 of the relevant learned clauses. 


2.2 SAT Modular Monotonic Theories (SMMT) 
We define a Boolean positive monotonic predicate as follows: 


Definition 1 (Positive Monotonic Predicate). A predicate p : {0,1}" > 
{0,1} is positively monotonic with respect to the input a; iff 


plar, "n ,Qi-1;0, Qi+1, -- x => plar, e, Qi—1, 1, Qi+1, aoe » 


The predicate p is a positive monotonic predicate iff p is positively monotonic 
with respect to every input. 


Negative monotonic predicates are defined analogously. If a predicate p is pos- 
itively monotonic w.r.t. some inputs At and negatively monotonic w.r.t. the 
rest of inputs A7, it is always possible to rewrite the predicate as a positive 
monotonic predicate p' over input At and (a | a € A^). For ease of exposition, 


5 RUP is all we use in this paper. RAT is a superset of RUP, by essentially doing 
one step of resolution as a “lookahead” before checking RUP of the resolvents. The 
“D” in DRAT stands for “deletion”, meaning the proof format also records clause 
deletions. 
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and without loss of generality, we will describe our theoretical results assuming 
positive monotonic predicates only (except where noted otherwise). 

Given a monotonic predicate p over input A, we will use boldface p as the 
predicate atom for p, i.e., the predicate atom is a Boolean variable in the CNF 
encoding of the theory, indicating whether p(A) is true or not. The theory of p 
is the set of valid implications in the form of MA => p where M4 is a partial 
assignment over A. 

The following are the most used monotonic theories: 


Graph Reachability: Given a graph G — (V, E), where V and E are sets of 
vertices and edges, the graph reachability theory contains the reachability 
predicates reach? on the input variables e1,€2...e,, € E, where u,v € V. 
The predicate holds iff node u can reach v in the graph G by using only 
the subset of edges whose corresponding variable e; is true. The predicate 
is positively monotonic because enabling more edges will not make reach- 
able nodes unreachable, and disabling edge will not make unreachable nodes 
reachable. 

Bit- Vector Summation and Comparison: Given two bit-vectors (BV) d 
and b, the theory of BV comparison contains the predicate @ > b, whose 
inputs are the bits of à and b. The predicate holds iff the value (interpreted 
as an integer) of d is greater or equal to the value of b. The predicate is 
positively monotonic for the variables of d and negatively monotonic for 
the variables of b, because changing any 0 to a 1 in d makes it bigger, and 
changing any 1 to 0 in b makes it smaller. Similarly, given two sets of BVs 
A and B, the theory of comparison between sums contains the predicate 
5 A> 5 B whose inputs are the boolean variables from all BVs in A and 
B. The predicate holds iff the sum of the BVs in A is greater or equal to 
the sum of the BVs in B, and is positively monotonic in A and negatively 
monotonic in B. 

S-T Max Flow Given a graph G = (V, E), for every edge e € E, let its capacity 
be represented by the BV cap,. For two vertices s,t € V, and a BV 7, the 
max-flow theory contains the predicates MF! > Z over the input variables 
€1,€2-..€n € E and cáp, , cáp,, ... cáp, . The predicate holds iff the max- 
imum flow from the source s to the target t is greater or equal to Z, using 
only the enabled edges (as in the reachability theory) with their specified 
capacities. 


The SMMT Framework describes how to extend a SAT or SMT solver 
with Boolean monotonic theories. The framework has been implemented in the 
SMT solver MonoSAT, which has been deployed in production by Amazon Web 
Services to reason about a wide range of network properties [28]. The framework 
performs theory propagation and clause learning for SMMT theories as follows: 
(In this description, we use P for the set of positive monotonic predicates, and 
S for the set of Boolean variables that are arguments to the predicates.) 


Theory Propagation: Given a partial assignment M, let Ms be the partial 
assignment over S. The SMMT framework forms two complete assignments 
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of M,: one with all unassigned s atoms assigned to false (M; ), one with 
all unassigned s atoms assigned to true (M$). Since M; and M; are each 
complete assignments of S, they can be used to determine the value of P 
atoms. Since every p € P is positively monotonic, (1) if M; => p, then 
M; => p, and (2) if M} > 7p, then M, > ~p. The framework uses M; 
and Mj as the under- and over-approximation for theory propagation over 
P atoms. Moreover, the framework attaches M, => p or M, => ~p as the 
reason clause for the theory propagation. 

Clause Learning: For some predicates, a witness can be efficiently generated 
during theory propagation, as a sufficient condition to imply the predicate p. 
For example, in graph reachability, suppose M; = reachy,,q for a given 
under-approximation M; . Standard reachability algorithms can efficiently 
find a set of edges M! C M, that forms a path from u to v. When such 
a witness is available, instead of learning M, — p, the framework would 
use the path witness to learn the stronger clause M? => p. Witness-based 
clause learning is theory specific (and implementation specific); if a witness 
is not available or cannot be efficiently generated in practice for a particular 
predicate, the framework will learn the weaker clause M, — p. 


3 Overview of Our Method 


Most leading SMT solvers, including MonoSAT, use the DPLL(T) frame- 
work [22], in which a CDCL propositional SAT solver coordinates one or more 
theory-specific solvers. A DPLL(T) solver behaves similarly to a CDCL proposi- 
tional SAT solver — making decisions, performing unit propagation, analyzing 
conflicts, learning conflict clauses — except that the theory solvers will also in- 
troduce new clauses (i.e., theory lemmas) into the clause database, which were 
derived via theory reasoning, and whose correctness relies on the semantics of 
the underlying SMT theory. These theory lemmas cannot (in general) be de- 
rived from the initial clause database, and so cannot be verified using DRAT. 
Therefore, the problem of producing a proof of UNSAT in SMT reduces to the 
problem of proving the theory lemmas. 

A direct approach would be to have the SMT solver emit a partial DRAT 
proof certificate, in which each theory lemma is treated as an axiom. This par- 
tial proof is DRAT-checkable, but each theory lemma becomes a new proof 
obligation. The theory lemmas could subsequently be verified using external 
(non-DRAT), trusted, theory-specific proof-checking procedures. This is the ap- 
proach recently proposed by Otoni et al. [33]. 

We take such an approach as a starting point, but instead of theory-specific 
proof procedures, we use propositional definitions of the theory semantics to add 
clauses sufficient to prove (by RUP) the theory lemmas. The resulting proof is 
purely DRAT, checkable via standard DRAT checkers, with no theory-specific 
proof rules. Fig. [1] explains our approach in more detail; Sec. [4] dives into how we 
derive the added clauses; and Sec. D]gives sample propositional theory definitions. 


DRAT Proofs of Unsatisfiability for SAT Modulo Monotonic Theories 9 


instance.gnf » Monosat » proof » gerim, core lemmas —> drat-trim 
certificate theory 


proof 
obligations 
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] Instantiation-based 
m > Horn 

1 approximation 


Monotonic 
definition (CNF) 


Proof-specific 
Horn definition 


| | Theory predicate 
| | definition (CNF) 


Fig. 1. Overview of Our Proof Generation and Checking Method. Inputs (the problem 
instance file and the propositional definitions of theory predicates) are colored blue; 
new and modified components are colored orange. Starting from the top-left is the 
SMMT problem instance, which is solved by MonoSAT. We extended MonoSAT to 
emit a DRAT-style proof certificate, consisting of learned (via propositional or theory 
reasoning) clauses, similar to what is proposed in [83]. The proof certificate is op- 
tionally pre-processed by drat-trim-theory, in which we modified the BackwardCheck 
procedure to perform a backward traversal from the final L, outputting a subset 
of lemmas sufficient (combined with the original clause database) to derive L. This is 
extra work (since a full BackwardCheck is later performed by unmodified drat-trim for 
the final proof verification at the top-right of the figure), but allows us to avoid verifying 
some theory lemmas that are not relevant to the final proof. The resulting core lemmas 
are split between the propositional learned clauses, which go straight (right) to drat- 
trim, and the theory learned clauses, which are our proof obligations. The heart of our 
method is the instantiation-based Horn approximation (bottom-center, described in 
Sec. ø- In this step, we use the proof obligations as hints to transform the pre-defined, 
propositional theory definitions (bottom-left, examples in Sec. into proof-specific 
Horn definitions. The resulting proof-specific definitions together with the CNF from 
the input instance can efficiently verify UNSAT using unmodified drat-trim [37]. 


4 Instantiation-Based Horn Approximation 


'This section describes how we derive a set of clauses sufficient to make theory 
lemmas DRAT-checkable. Section [4.1] introduces one-sided propositional defini- 
tions and motivates the goal of a compact, Horn-clause-based definition. Sec- 
tion [4.2] gives a translation from an arbitrary propositional definition of a mono- 
tonic predicate to a monotonic definition, as an intermediate step toward con- 
structing the final proof-specific, Horn definition in Section [4.3] 


4.1 One-Sided Propositional Definitions and Horn Clauses 


Definition 2 (Propositional Definition). Let p be the positive predicate 
atom of predicate p over Boolean arguments A. A propositional definition of 
p, denoted as Xp, is a CNF formula over variables V D (var(p) U A) such that 
for every truth assignment M to the variables in A, (1) Xp|m is satisfiable and 
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Pm prm ot 
ASO "eof o-o ede 


Fig. 2. Directed Graph for Running Example in Sec. |4| [4] In the E graph (left), 
the reachability predicate reach’, is a function of the edge inputs a,...,h. 


cut 


(2) Sp E- (M = p) if and only if p(M) is T. The propositional definition of p 
is defined analogously. 


For example, the Tseitin-encoding of a logic circuit that computes p( M) satisfies 
this definition. However, note that a propositional definition for p can be one- 
sided: it is not required that Xp E (M = p) when p(.M) is L. That case 
is handled by a separate propositional definition for p. We will see that this 
one-sidedness gives some freedom to admit more compact definitions. 

Given a propositional definition Xp, any theory lemma M 4 = p is a logical 
consequence of Xp, but this might not be RUP checkable. One could prove 
Xp E (Ma > p) by calling a prootgenerating SAT solver on Xp ^ Ma >p, 
ie., bit-blasting the specific lemma, but we will see experimentally (in Sec. 6) 
that this works poorly. However, if the propositional definition is limited to Horn 
theory (i.e., each clause has at most one positive literal), then every SMMT 
theory lemma can be proven by unit propagation: 


Theorem 1. Let p be a positive monotonic predicate over input A, and let zh 
be a propositional definition for the positive atom p. If zi is set of Horn clauses, 
then for any theory lemma M a = p where M4 is a set of positive atoms from 
A, Xn = (Ma => p) if and only if xn Fy (Ma => p). 


Proof. Suppose X} | (Ma => p), then E ^ (Ma ^ p) is unsatisfiable. Since 
M 4 ^ p is equivalent to a set of unit clauses, ah ^ (Ma ^ p) still contains only 
Horn clauses, so satisfiability can be determined by unit propagation. 


Example 1. Let reach. be the reachability predicate for the directed graph 
shown in Fig. |2| (left). The definition schema for graph reachability in Sec. 
yields the following set of Horn clauses: X^ = (1) svav v1, (2) vI VTV v3, 


reacht ' . 


(3) v8 v hvt, (4) 5V bV v2, (5) vàvev v2, (6) v2 V d V v4, (T) v4 V f V v3, 
(8 v4 V gV t, (9) t V reacht, (10) s, where v1,...,v5, s, and t are auxiliary 
variables. Any theory lemma of the form M 4 => p, e.g., 2/€VhVreachf, can be 
proven from D" via unit propagation. Also, note that one- sidedriess allows 


a simpler definition, despite the cycle in the graph, e.g., consider assignment 
M = {a,b,¢,d,e,f,g,h}. Then, reach? = L, but X^ KK (M = reacht). 


reacht 


Horn theory has limited expressiveness, but it is always sufficient to encode 
a propositional definition for any SMMT theory: Given a monotonic predicate 
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atom p, we can always encode a Horn propositional definition zh as the con- 
junction of all valid theory lemmas from the theory of p. This is because every 
theory lemma is restricted to the form (M 4 => p), where M 4 is a set of positive 
atoms (due to monotonicity). Hence, xb is a set of Horn clauses. However, such 
a naive encoding blows up exponentially. Instead, we will seek a compact Horn 
definition EL that approrimates a non-Horn propositional definition Xp: 


Definition 3 (Horn Upper-Bound). Let Xp be a propositional definition of 
p. A set of Horn clauses x is a Horn upper-bound if Xp = xn 


For the strongest proving power, we want the tightest Horn upper-bound 
possible. Unfortunately, the least Horn upper-bound of a non-Horn theory can 
still contain exponentially many Horn clauses [35]. Fortunately, we don't actually 
need a Horn upper-bound on the exact theory definition, but only of enough of 
the definition to prove the fixed set of theory lemmas that constitute the proof 
obligations. This motivates the next definition. 


Definition 4 (Proof-Specific Horn Definition). Given an exact definition 
Xp and a set of theory lemmas O :— {C),...C,} from the theory of p, a proof- 
specific Horn definition of p is a Horn upper-bound x of Xp such that xt Fi 
C for every C € O. 


Our goal in the next two subsections is how to derive such compact, proof-specific 
Horn definitions. 


Example 2. Continuing Ex. |1| given a proof obligation O with two theory 
lemmas: (avec v hV reacht, bVdVgVreacht}, the subset of Horn clauses with 
IDs (1), (2), (3), (4), (6), (8), (9) and (10) is a proof-specific Horn definition for 
reacht, which can be visualized in Fig. |2| (middle). 


Given a proof obligation O, we can make all theory lemmas in O DRAT 
checkable if we have exact propositional definitions for the theories and if we 
can dynamically transform them into compact, proof-specific Horn definitions 
at the time of proof checking. We simply add these additional clauses to the 
input of the DRAT-proof-checker. 


4.2 Monotonic Definitions 


The derivation of compact, proof-specific Horn definitions from arbitrary propo- 
sitional definitions is a two-step process: we first show that every propositional 
definition for a monotonic predicate atom can be converted into a monotonic 
definition of linear size (this section), and then use theory lemmas in the proof 
obligations to create the Horn approximation of the definition (Sec. [13]. 


Definition 5 (Monotonic Definition). Let a monotonic predicate p over in- 
put A be given. A CNF formula AT is a monotonic definition of the positive 
predicate atom p if E is a propositional definition of p, and it satisfies the fol- 
lowing syntax restrictions: (1) X does not contain positive atoms from A, (2) 
5z does not contain p, and (3) p appears only in Horn clauses. The monotonic 
definition for p is defined analogously. 
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We now define the procedure, MONOT, for transforming a propositional def- 
inition into a linear-size monotonic definition: 


Definition 6 (Monotonic Transformation). Let a monotonic predicate p 
over input A and a propositional definition Xp for the positive predicate atom p 
be given. MONOT(p, Xp) is the result of the following transformations on Xy: 
(1) replace every occurrence of an input atom (a for a € A) in Xy with a new 
atom a! (a is replaced with a’), (2) replace every occurrence of p and p with p' 
and p' respectively, and (3) add the following Horn clauses: a > a' for every 
à € A, and p! >p. 


Theorem 2 (Correctness of Monotonic Transformation). Given a mono- 
tonic predicate p over input A and the monotonic predicate atom p, if we have 
any propositional definition Xp with n clauses, then MONOT (p, Xp) results in 
a monotonic definition X$ with at most n + |A| +1 clauses. 


The proof of Theorem [2] is in the extended version of this paper. The cor- 
rectness relies on the fact that the predicate p is indeed monotonic, and that our 
propositional definitions need only be one-sided. If the monotonic definition is 
already in Horn theory, it can be used directly verify theory lemmas via RUP; 
otherwise, we proceed to Horn approximation, described next. 


4.3 Instantiation-Based, Proof-Specific Horn Definition 


We present the transformation from monotonic definitions into proof-specific 
Horn definitions. The transformation exploits the duality between predicates’ 
positive and negative definitions. 


Lemma 1 (Duality). Let p be a monotonic predicate over Boolean arguments 
A. Suppose Xp and Xp are positive and negative propositional definitions, re- 
spectively. For every assignment M to the variables in A: 


1. SpE (M > p) if and only if Xp AMA p is satisfiable. 
2. Xp Æ| (M => p) if and only if Xy ^ M ^p is satisfiable. 

'The proof of Lemma [1] is in the extended version of this paper. The duality 
of the positive (Xp) and negative (Xp) definitions allows us to over-approximate 
positive (negative) definitions by instantiating the negative (positive) definitions. 
Example 3. Returning to Ex. [I] and Fig. 2] consider the assignment M = 


(a, b, c, d, e, f, g, h]. Since s cannot reach t under this assignment, any proposi- 
tional definition X-« must imply M = reacht. Dually, X rae ^ M ^reachf 


r 
is satisfiable, e.g., (s, v1, v2, v3, v4, t]. 


Lemma 2 (Instantiation-Based Upper-Bound). Let a predicate p over in- 
put A and a positive definition Xp be given. For any partial assignment M' over 
var(Xip) \ (var(p)U A), Xplw'up => p is an over-approximation of Xp. [| 


$ Note that Xpļm is encoded in CNF, so to compactly (i.e., linear-size) encode 
Sip|w => p in CNF, we introduce a new literal l; for each clause C; € Xplm’, 
create clauses c; V l; for each literal ci; € Ci, and add clause li Vlg V ... V In V p. 
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The proof of Lemma |2| (in the extended paper) relies on the duality in 
Lemma[l] Lemma [| enables upper-bound construction and paves the way for 
constructing an instantiation-based Horn upper-bound of a monotonic definition. 


Lemma 3 (Instantiation-Based Horn Upper-Bound). Given a monotonic 
predicate p over input A and a positive monotonic definition M: let X repre- 
sent the set of auxiliary variables: var(Z$) V (AU var(p)). For any complete 
satisfying assignment Mxua to 2 lp the formula (Xg |pum x) => p serves as a 
Horn upper-bound for any propositional definition of p, where Mx is a partial 
assignment derived from Mxua for the auxiliary variables X. 


(Proof in the extended paper.) Note that the instantiation-based Horn upper- 
bound of a negative predicate atom p is constructed from a monotonic definition 
of the positive predicate atom Em , and vice-versa. 

For a given theory lemma, the instantiation-based Horn upper-bound con- 
struction (Lemma [3) enables the verification of the theory lemma if we can find 
a sufficient “witness” Mx for the instantiation. We now prove that a witness 
always exists for every valid theory lemma and does not exist otherwise. 


Theorem 3 (Lemma-Specific Horn Upper-Bound). Let a monotonic pred- 
icate p over input A, a monotonic definition x and a lemma in the form 
Ma => p be given. We denote X as the set of auxiliary variables: var( X$) \ 
(AU var(p)). The lemma Ma = p is in the theory of p if and only if there 
exists an assignment Mx on X such that: (1) X5 |pumxuma is satisfiable and 
(2) (Dt lpumx > B) i (Ma = B). 


(Proof in the extended paper.) Theorem [3] states that a lemma-specific Horn 
upper-bound for a theory lemma Ma = p can be constructed by instantiat- 
ing the monotonic definition using a “witness” assignment M y. [] The witness 
could be obtained by performing SAT solving on the formula X% | Mtup? (where 
MU, is the extension of M4 by assigning unassigned input variables in A to T 
(Sec..2]). However, in practice, a better approach is to modify the SMMT solver 
to produce the witness during the derivation of theory lemmas. In Section [5] we 
provide examples of witnesses for commonly used monotonic predicates. 

Note that the witness is not part of the trusted foundation for the proof. 
An incorrect witness might not support verification of a theory lemma, but if 
a theory lemma is verified using a specific witness M x, Theorem |3| guarantees 
that the lemma is valid. 


Example 4. Continuing the example, let a theory lemma L := c V d V reacht 


be given. To derive a lemma-specific Horn upper-bound for X —-:, we first ob- 


tain a witness M x by finding a satisfying assignment to the formula E Vds ^ 
M ^ reacht, where M :— {a,b,¢,d,e,f,g,h} (by assigning the unassigned 
T Instead of instantiating a complete assignment on every auxiliary variable in X, a 


partial instantiation is sufficient so long as it determines the assignments on the 
other variables. 
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input variables in L to T). Since M is a complete assignment to the edge vari- 
ables, the graph is fully specified, and a suitable witness M x can be efficiently 
computed using a standard graph-reachability algorithm, to compute the reach- 
ability status of each vertex. The witness M x is (s, v1, v2, v3, v4, t]. Following 
the construction in Theorem the formula X^ simplifies to 


ums reachtUM x 
two (unit) clauses: € and d (from clauses (2) and (6) in Ex. [1]. which can be 
visualized as the cut in Fig. [2] (right). The lemma-specific Horn upper bound 
D acht | Foacktù Mx = reacht is, therefore, c A d = reach§, which in this ex- 
ample is already CNF, but more generally, we would introduce two literals to 
encode the implication: (cVv1;, dvl5, lı Vla Vreacht}. The lemma-specific Horn 
upper-bound is dual-Horn and implies the theory lemma L by unit propagation. 


From the lemma-specific Horn upper-bounds, we construct the proof-specific 
Horn definition by combining the lemma-specific Horn upper-bounds for all lem- 
mas in the proof obligations. 

In summary, to efficiently verify SMMT theory lemmas, we propose the fol- 
lowing approach: (1) define the propositional definitions (in CNF) for the atoms 
of theory predicates; (2) transform the definitions into monotonic definitions of- 
fline; (3) during proof checking, approximate a proof-specific Horn definition (if 
not already Horn) from the constructed monotonic definition using theory lem- 
mas in the proof; (4) combine the proof-specific definition together and verify 
the proof via RUP. The only theory-specific, trusted foundation for the proof is 
the definition for the theory atoms. (The extended version of this paper contains 
a figure to help visualize this workflow.) 


Example 5. Summarizing, the positive propositional definition Xyeacnt in Ex. 
is already Horn, so is sufficient for verifying via DRAT any SMMT lemmas that 
imply reacht. To verify lemmas that imply reacht, we can compute a proof- 
specific definition of reacht from Zreacht using Theorem B 


Remark 1. 'The only trusted basis of our approach are the propositional defini- 
tions of theory atoms. For the monotonic theories in the section [5] we considered 
the definitions intuitively understandable, and therefore sufficiently trustworthy. 
But to further increase confidence, propositional definitions can be validated us- 
ing techniques from hardware validation/verification, e.g., simulation to sanity- 
check general behavior, equivalence checking against known-good circuits, etc. 


5 Example Propositional Definitions 


In this section, we illustrate the monotonic definitions for the most commonly 
used monotonic predicates. Due to space constraints, we present only graph 
reachability here in detail, and only sketch bit-vector comparison and summa- 
tion, and max-flow. Full definitions for those theories are in the extended version 
of this paper. 

Graph Reachability: Given a graph G — (V, E) where V and E are sets 
of vertices and edges, respectively, as discussed in Sect. the graph reacha- 
bility theory contains the reachability predicate reach? for u,v € V over input 
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€1,€2...€s € E. For convenience, we refer to the positive edge atom for the edge 
from vertex i to vertex j as ej.,;. The predicate is positively monotonic for E, 
and the monotonic definition for the positive predicate atom reach contains 
the clauses: 


1. reach! V eji 5j V reach! for every edge e € E and the unit clause reach" 
2. reach" V reach] 


The monotonic definition introduces a reachability atom reach? for every 
i € V and asserts the fact that u is reachable from itself. For every edge (i, 7), if 
the edge (i, j) is enabled (e;_,;) and i is reachable (reach?), then j must also be 
reachable (reach?). The predicate atom reachy is implied by the reachability 
of v (reach"). The definition is monotonic since it only contains negative edge 
atoms. Moreover, the definition is already a Horn definition and can be used 
directly for proving theory lemmas in the theory of reach? without the need for 
transformation into a proof-specific Horn definition. The size of the definition is 
O(|B|). 

Instead of defining the monotonic definition for the negative predicate atom 
reachyY, we construct its proof-specific definition from the monotonic definition 
of the positive predicate atom reachy. For each theory lemma in the proof, the 
witness for constructing the lemma-specific Horn upper-bound is the reachability 
status (reach!) of every vertex i € V, which is efficiently computed in the SMMT 
solver using standard graph-reachability algorithms. 

Bit-Vector Comparison (sketch): The positive definition is just the Tseitin 
encoding of a typical bit-vector comparison circuit, with some simplification due 
to being one-sided: For each bit position 7, we introduce auxiliary variables ge; 
and gti, which indicate that the more-significant bits from this position have 
already determined vector d to be > or > b, respectively. Simple clauses compute 
gei-i and gti—ı from ge; and gt; and the bits at position 7 — 1 of d and b. The 
negative definition is similar. These are both Horn, so can be used without 
further transformation into proof-specific Horn definitions. 

Bit-Vector Summation and Comparison (sketch): These are basically 
Tseitin encodings of ripple-carry adders, combined with the comparison theory 
above — using Def. [6] to handle the fact that the the Tseitin encodings of the 
XOR gates in the adders are non-monotonic with respect to the input bit-vectors. 
The resulting propositional definitions are not Horn, so we use witnesses to 
construct lemma-specific Horn definitions. The witnesses come from the SMMT 
solver maintaining lower and upper bounds on the possible values of the bit- 
vectors, e.g., a witness for Y; A > Y; B are lower bounds for the vectors in A 
and upper bounds for the vectors in B such that their sums make the inequality 
true. (Mutadis mutandis for the negative witness.) 

Max-Flow (sketch): For the positive definition (that the max-flow exceeds 
some value), we introduce auxiliary bit-vectors to capture the flow asisgned to 
each edge. We use the bit-vector theories to ensure that the flows do not exceed 
the edge capacities, that each node's (except the source) outgoing flows do not 
exceed the incoming flows (equality is unnecessary due to the one-sidedness), and 
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that the flow to the sink exceeds the target value. For the negative definition, we 
exploit the famous max-flow/min-cut duality. We introduce an auxiliary variable 
incute for each edge. We use the graph reachability theory to ensure that the 
edges in the cut separate the source from the sink, and the bit-vector summation 
theory to ensure that the capacity of the cut does not exceed the target max- 
flow value. Both the positive and negative definitions are not Horn, so require 
instantiation-based upper-bounds. The witnesses are the flow values or the cuts, 
and are easily computed by the SMMT solver. 


6 Experimental Evaluation 


To evaluate our proposed method, we implemented it as shown earlier in Fig. 
(Sec.|3). We call our implementation MonoProof (available WEE 
con/Ni ckFO211/MonoProof). 

The two basic questions of any proof-generating SAT/SMT solver are: (1) 
how much overhead does the support for proofs add to the solving time, and 
(2) how efficiently can a proof be prepared from the proof log, and verified? 
For the first question, we compare the runtime of unmodified MonoSAT ver- 
sus the MonoSAT that we have extended to produce proof certificates. For the 
second question, we need a baseline of comparison. MonoProof is the first proof- 
generating SMMT solver, so there is no obvious comparison. However, since 
SMMT theories are finite-domain, and bit-blasting (i.e., adding clauses that 
encode the theory predicates to the problem instance and solving via a proposi- 
tional SAT solver) is a standard technique for finite-domain theories, we compare 
against bit-blasting. Arguably, this comparison is unfair, since MonoSAT out- 
performs bit-blasting when solving SMMT theories [9]. Thus, as an additional 
baseline, we propose an obvious hybrid of SMMT and bit-blasting, which we dub 
Lemma-Specific Bit-Blasting (LSBB): we run MonoProof until the core theory 
lemmas have been extracted, benefitting from MonoSAT's fast solving time, but 
then instead of using our techniques from Sec. 4| we bit-blast only the core theory 
lemmas[] 

We ran experiments on 3GHZ AMD Epyc 7302 CPUs with 512GB of DDRA 
RAM, with a timeout of 1 hour and memory limit of 64GB. For the bit-blasting 
SAT solver, we use the state-of-the-art SAT solver Kissat [13]. In all cases, the 
proof is verified with standard DRAT-trim [37]. 


6.1 Benchmarks 


We wish to evaluate scalability on real, industrial problems arising in practice. 
MonoProof has successfully generated and verified industrial UNSAT proofs for 


5 We implemented this both via separate SAT calls per lemma; and also by providing 
all lemmas in a single SAT call (with auxiliary variables to encode the resulting 
DNF), to allow the solver to re-use learned clauses on different lemmas. The latter 
approach generally worked better, so we report those results, but (spoiler) neither 
worked well. 
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a set of hard, unsatisfiable Tiros queries collected in production use at AWS 
over a multi-week period. However, these instances are proprietary and cannot 
be published, making them irreproducible by others. Instead, we evaluate on two 


sets of benchmarks that we can publicly release (also at https://github.com/ 
NickF0211/MonoProof): 


Network Reachability Benchmarks. These are synthetic benchmarks that 
mimic the real-world problems solved by Tiros, without disclosing any propri- 
etary information. Network reachability is the problem of determining whether a 
given pair of network resources (source and destination) can communicate. The 
problem is challenging because network components can intercept, transform, 
and optionally re-transmit packets traveling through the network (e.g., a fire- 
wall or a NAT gateway). Network components come in various types, each with 
their own complex behaviors and user-configurable network controls. In these 
benchmarks, we abstract to two types of intermediate components: simple and 
transforming. Simple components relay an incoming packet as long as its des- 
tination address belongs to a certain domain, expressed in terms of a network 
CIDR (Classless Interdomain Routing), e.g., 10.0.0.0/24. Transforming network 
components intercept an incoming packet and rewrite the source address and 
ports to match their own before re-transmitting it. The simple network compo- 
nents are akin to subnets, VPCs, and peering connections; transforming network 
components are a highly abstracted version of load balancers, NAT gateways, 
firewalls, etc. The SMT encoding uses the theories of bit vectors and of graph 
reachability. The network packets are symbolically represented using bit vectors, 
and the network is modeled as a symbolic graph. Network behavior is modeled 
as logical relations between packets and elements in the network graph. Unsatis- 
fiability of a query corresponds to unreachability in the network: for all possible 
packet headers that the source could generate, and for all possible paths connect- 
ing the source to the destination, the combined effect of packet transformations 
and network controls placed along the path cause the packet to be dropped from 
the network before it reaches its destination. 


We generated 24 instances in total, varying the size and structure of the 
randomly generated network. Graph sizes ranged from 1513 to 15524 (average 
5485) symbolic edges. 


Escape Routing Benchmarks. Escape routing is the problem of routing all 
the signals from a component with extremely densely packed I/O connections 
(e.g., the solder bumps on a Ball-Grid Array (BGA)) to the periphery of the com- 
ponent, where other routing techniques can be used. For a single-layer printed 
circuit board (PCB), escape routing is optimally solvable via max-flow, but real 
chips typically require multiple layers. The multi-layer problem is difficult be- 
cause the vias (connections between layers) are wider than the wires on a layer, 
disrupting what routes are possible on that layer. Bayless et al. [IT] proposed a 
state-of-the-art solution using SMMT: max-flow predicates determine routability 
for each layer on symbolic graphs, whose edges are enabled/disabled by logical 
constraints capturing the design rules for vias. 
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Fig. 3. Cactus Plots for Solving (left) and Proof Preparation&Checking (right). Each 
point is the runtime for one instance, so the plot shows the number of instances (x- 
axis) that ran in less than any time bound (y-axis). BB denotes standard bit-blasting; 
LSBB, lemma-specific bit-blasting; and MonoProof is our new method. The left graph 
shows that MonoProof (and LSBB, which uses MonoProof's solver) is vastly faster 
than bit-blasting for solving the instances. The right graph shows that MonoProof is 
also vastly faster than bit-blasting for proving the result; LSBB timed-out on all proofs. 


In [11], 24 commercial BGAs were analyzed under two different via technolo- 
gies and different numbers of layers. For our benchmark set, we select all con- 
figurations where the provable minimum number of layers were reported. This 
results in 24 unsatisfiable SMMT problems instances (routing with one fewer 
layer than the minimum), which exercise the bit-vector and max-flow theories. 
Graph sizes ranged from 193994 to 3084986 (average 717705) symbolic edges. 


6.2 Results 


Returning to the two questions for our evaluation: 

1. The solver overhead of our proof certificate generation is minimal. On the 
network reachability benchmarks, the geometric mean (GM) runtime overhead 
was 14.10% (worst case 28.8%). On the escape routing benchmarks, the GM 
runtime overhead was only 1.1196 (the worst case 5.7196). (The lower overhead 
is because MonoSAT spent more time learning theory lemmas vs. recording 
them in the proof.) The overall GM runtime overhead across all benchmarks 
was 7.41%. These overhead figures are comparable to state-of-the-art, proof- 
generating SAT solvers, which is not surprising, since our proof certificates are 
essentially the same as a DRAT proof certificate in SAT. This compares favorably 
with the solver overhead of heavier-weight, richer, and more expressive SMT 
proof certificates like LFSC [34]. 

2. MonoProof's time to prepare and check a proof of unsatisfiability is markedly 
faster than standard bit-blasting or lemma-specific bit-blasting. Fig. [3] summa- 
rizes our results. (A full table is in the extended version of this paper.) The left 
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graph shows solving times (with proof logging). Since the proof-logging over- 
head is so low for both bit-blasting (Kissat generating DRAT) and MonoProof, 
these results are consistent with prior work showing the superiority of the SMMT 
approach for solving [9]. Note that bit-blasting (BB) solved all 24 network reach- 
ability instances, but failed to solve any of the 24 escape routing instances in the 
1hr timeout. Lemma-specific bit-blasting (LSBB) and MonoProof share the same 
solving and proof-logging steps. The right graph shows proof-checking times (in- 
cluding BackwardCheck and proof-specific Horn upper-bound construction for 
MonoProof). Here, BB could proof-check only 11/24 reachability instances that 
it had solved. Restricting to only the 11 instances that BB proof-checked, Mono- 
Proof was at least 3.7x and geometric mean (GM) 10.2x faster. LSBB timed out 
on all 48 instances. Summarizing, MonoProof solved and proved all 48 instances, 
whereas BB managed only 11 instances, and LSBB failed to prove any. 

The above results were with our modified BackwardCheck enabled (drat- 
trim-theory in Fig. [1]. Interestingly, with BackwardCheck disabled, MonoProof 
ran even faster on 37/48 benchmarks (min speedup 1.03x, max 6.6x, GM 1.7x). 
However, enabling BackwardCheck ran faster in 10/48 cases (min speedup 1.02x, 
max 7.9x, GM 1.6x), and proof-checked one additional instance (69 sec. vs. 1hr 
timeout). The modified BackwardCheck is a useful option to have available. 


7 Conclusion 


We have introduced the first efficient proof-generating method for SMMT. Our 
approach uses propositional definitions of the theory semantics and derives com- 
pact, proof-specific Horn-approximations sufficient to verify the theory lemmas 
via RUP. The resulting pure DRAT proofs are checkable via well-established (and 
even machine verified) tools. We give definitions for the most common SMMT 
theories, and experimental results on industrial-scale problems demonstrate that 
the solving overhead is minimal, and the proof preparation and checking times 
are vastly faster than the alternative of bit-blasting. 

The immediate line of future work is to support additional finite domain 
monotonic theories, such as richer properties on pseudo-boolean reasoning. We 
also aim to apply our approach to support monotonic theories beyond finite 
domains. In addition, we plan to extend our proof support to emerging proof 
format such as LRAT [18] and FRAT [3] that enable faster proof checking. 
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Abstract. Z3-NooprzR is a fork of Z3 that replaces its string theory solver 
with a custom solver implementing the recently introduced stabilization-based 
algorithm for solving word equations with regular constraints. An extensive ex- 
perimental evaluation shows that Z3-Noop er is a fully-fledged solver that can 
compete with state-of-the-art solvers, surpassing them by far on many bench- 
marks. Moreover, it is often complementary to other solvers, making it a suitable 
choice as a candidate to a solver portfolio. 


1 Introduction 


Recently, many tools for solving string constraints have been developed, motivated 
mainly by techniques for finding security vulnerabilities such as SQL injection or cross- 
site scripting (XSS) in web applications (34/35/BG]. String solving has also found its 
applications in, e.g., analysis of access user policies in Amazon Web Services [28/8]B9] 
or smart contracts [7]. Solvers for string constraints are usually implemented as string 
theory solvers inside SMT solvers, such as cvc5 [9] or Z3 [BT], allowing combination 
with other theories, most commonly the theory of integers for string lengths. Other 
well known string solvers include Z3srg3RE [[[3,12]], Z3-Trau [M], Z3sra4 [30], OS- 
TRICH [[[9], and others. 

In this paper, we present Z3-Noop er 1.0.0 [47], a fork of Z3 4.12.2 where the 
string theory solver is replaced with the stabilization-based procedure for solving string 
(dis)equations with regular and length constraints [[[4]20]. The procedure makes heavy 
use of nondeterministic finite automata (NFAs) and operations over them, for which we 
use the efficient Mata library for NFAs [23)29]]. 

The presented version implements multiple improvements over a previous Z3- 
Noob er prototype from [20]. Firstly, it extends the support for string predicates from 
the SMT-LIB string theory standard [[[1] by (1) applying smarter and more specific 
axiom saturation and (2) adding support for their solving inside the decision procedure 
(e.g., for the contains predicate). It also implements various optimizations (e.g., for 
regular constraints handling) and other decision procedures, e.g., the Nielsen transfor- 
mation for quadratic equations and a procedure for regular language (dis)equations; 
moreover, we added heuristics for choosing the best decision procedure to use. 

We compared Z3-Noop.er with other string solvers on standard SMT-LIB bench- 
marks [[[0/42]43]. The results indicate that Z3-NoopLErR is competitive, superior espe- 
cially on benchmarks containing mostly regular constraints and word (dis)equations, and 
that the improvements since had a large impact on the number of solved instances 
as well as its overall performance. 


© The Author(s) 2024 
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2 Architecture 


Z3-NoopLER replaces the string theory solver in the DPLL(T)-based SMT solver Z3 
(version 4.12.2) with our string solver Noob.er [14], which is based on the stabilization 
algorithm (cf. Section p). DPLL(T)-based solvers in general combine a SAT solver 
providing satisfying assignments to the Boolean skeleton of a formula with multiple 
theory solvers for checking conjunctions of theory literals. 

Z3-NoopLzn still uses the infrastructure of Z3, most importantly the parser, string 
theory rewriter and the /inear integer arithmetic (LIA) solver. The Z3 parser takes 
formulae in the SMT-LIB format [10], where Z3-NooprEn can handle nearly all pred- 
icates/functions (such as substr, len, at, replace, regular membership, word equa- 
tions, etc.) in the string theory as defined by SMT-LIB [11]. 

Even though we do use the string theory rewriter of Z3, we disabled those rewritings 
that do not benefit our core string solver. For instance, we removed rules that rewrite 
regular membership constraints to other types of constraints since solving regular con- 
straints and word equations using our stabilization-based approach is efficient. 

The interaction of the NoopLeR solver with 


F a 1 SMT string formula 
Z3 is shown in Fig.[I]and works as follows. Upon Z3 


receiving a satisfying Boolean assignment from WES a cM 
: i i string " 
the SAT solver (@), we first remove irrelevant ' gore rewriter 
assignments (using Z3’s relevancy propagation), ` oho ~ 
which allows us to work with smaller instances — |, ili Solver 
and return more general theory lemmas. A the- Noone 
ory assignment obtained from the Boolean as- string theory uw 
. " . . . solver 
signment consists of string (dis)equations, regu- b> < instancë 


lar constraints, and, possibly, predicates that were 
not axiom-saturated before (cf. Section[3]. 

The core Noop.er string decision procedure then reduces the conjunction of string 
literals to a LIA constraint over string lengths, and returns it to Z3 as a theory lemma (9), 
to be solved together with the rest of the input arithmetic constraints by Z3's internal 
LIA solver. NooprER implements a couple of decision procedures (discussed in Sec- 
tion B). heavily employing the Mata automata library (version 0.109.0) (©). As 
an optimization of the theory lemma generation, when the string constraint reduces into 
a disjunction of LIA length constraints, we check the satisfiability of individual dis- 
juncts (generated lazily on demand) separately in order to get a positive answer as soon 
as possible. For testing the disjuncts, the current solver context is cloned and queried 
about satisfiability of the LIA constraint conjoined with the disjunct (Q). 


Fig. 1: Architecture of Z3-NooDLER 


3 String Theory Core 


In this section, we provide details about Z3-Nooprzmn's string theory implementation, 
including initial axiom saturation, proprocessing, the core procedure, and limitations. 


Axiom Saturation. In order to best utilize the power of Z3’s internal LIA solver during the 
generation of a satisfiable assignment, we saturate the input formula with length-aware 
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theory axioms and axioms for string predicates (this happens during Z3's processing 
of the input formula, before the main SAT solver starts generating assignments). We 
can then avoid checking SAT assignments that trivially violate length conditions. Most 
importantly, we add length axioms len(t,) > 0, len(t;.t3) = len(t,) + len(t2) where 
tı, t2 are arbitrary string terms, and len(t,) = len(t2) for the word equation t4 = t2. 
Moreover, for string functions/predicates, NooDLER saturates the original formula 
with an equivalent formula composed of word (dis)equations and length/regular con- 
straints, which are more suitable for our core procedure (e.g., for acontains(s, "abc") 
in the input formula, we add the regular constraint s € X*abc&X*). We use differ- 
ent saturation rules for instances of predicates with concrete values. For instance, for 
substr(s, 4, 1), we add just the term at(s, 4). On the other hand, for substr(s, t;, tj), 
where s is a string term and f;, t; are general integer terms (possibly containing vari- 
ables), we need to add a more general formula talking about the prefix and suffix of s 
of given lengths. The original predicate occurrence is then removed from received 
assignments by NoopLER (Z3 does not allow to remove parts of the original formula). 


Decision Procedures. Z3-NooprLzR's string theory core contains several complementary 
decision procedures. The main one is the stabilization-based algorithm for solving word 
equations with regular constraints introduced in and later extended with efficient 
handling of length constraints and disequations [20]. The stabilization-based algorithm 
starts, for every string variable, with an NFA encoding regular constraints on the variable 
and iteratively refines the NFA according to the word equations until the stability 
condition is achieved. The stability condition holds when, for every word equation, 
the language of the left-hand side (obtained as the language of the concatenation of 
NFAs for variables and string literals) equals the language of the right-hand side. When 
stability is achieved, length constraints of the solutions are generated and passed to the 
LIA solver. The algorithm is complete for the chain-free [5] combinations of equations, 
regular and length constraints, together with unrestricted disequations, making it the 
largest known decidable fragment of these types of constraints. 

The stabilization-based decision procedure starts by inductively converting the initial 
regular constraints into NFAs. During the construction, we utilize eager simulation-based 
reduction with on-demand determinization and minimization. 

For an efficient handling of quadratic equations (systems of equations with at most 
two occurrences of each variable) with lengths, NoopLER implements a decision pro- 
cedure based on the Nielsen transformation [32]. The algorithm constructs a graph 
corresponding to the system and reasons about it to determine if the input formula is 
satisfiable or not [B8122]. If the system contains length variables, we also create a counter 
automaton corresponding to the Nielsen graph (in a similar way as in [28]). In the subse- 
quent step, we contract edges, saturating the set of self-loops and, finally, we iteratively 
generate flat counter sub-automata (a flat counter automaton only allows cycles that 
are self-loops), which are later transformed into LIA formulae describing lengths of all 
possible solutions. 

In order to solve (dis)equations of regular expressions, we reduce the problem to 
reasoning about the corresponding NFAs (similarly as for regular constraints handling). 
In particular, we use efficient NFA equivalence and universality checking from Mara, 
which implements advanced antichain-based algorithms [46]6]. 


Z3-NOODLER: An Automata-Based String Solver 27 


Preprocessing. Each decision procedure employs a sequence of preprocessing rules 
transforming the string constraint to a more suitable form. Our portfolio of rules includes 
transformations reducing the number of equations by a conversion to regular constraints, 
propagating epsilons and variables over equations, underapproximation rules, and rules 
reducing the number of disequations (cf. [20]). On top of that, Z3-NoopLeR employs 
information about length-equivalent variables allowing to infer simpler constraints (e.g., 
for xy = zw with len(x) = len(z), we can infer y = w). Z3-NoopLeR also checks for 
simple unsatisfiable patterns for early termination. A sequence of preprocessing rules is 
composed for each of the decision procedures differently, maximizing their strengths. 


Supported String Predicates and Limitations. Z3-Nooprzn currently supports handling 
of basic string predicates replace, substr, at, indexof, prefix, suffix, contains, 
and a limited support for contains. From the set of extended constraints, the core 
solver currently does not support the replace. a11 function (and variants of replacement 
based on regular expressions) and to/from int conversions. The decision procedures 
used in Z3-NoopLer make it complete for the chain-free fragment with unbounded 
disequations and regular constraints [20], and quadratic equations. Outside this fragment, 
our theory core is sound but incomplete. 


4 Experiments 


Tools and environment. We compared Z3-Noop.er with the following state-of-the-art 
tools: cvc5 [D] (version 1.0.8), Z3 (version 4.12.2), Z3sTR3RE [13112], Z3srr4 [B0], 
OSTRICH and Z3-NoopLEn" (version 0.1.0 used in [20]). We did not compare 
with Z3-Trau [2] as it is no longer under active development and gives incorrect results 
on newer benchmarks. The experiments were executed on a workstation with an Intel 
Xeon Silver 4314 CPU @ 2.4 GHz with 128 GiB of RAM running Debian GNU/Linux. 
The timeout was set to 120s, memory limit was set to 8 GiB. 


Benchmarks. 'The benchmarks come from the SMT-LIB repository, specifically 
categories QF_S and QF.SLIA (43]. These benchmarks were also used in SMT- 
COMP'23 [41], in which Z3-Noop er participated (version 0.2.0). As Z3-NooDLER 
does not support to/from int conversions and replace.all.like predicates, we ex- 
cluded formulae whose satisfiability checking needs their support. Based on the occur- 
rences of different kinds of constraints, we divide the benchmarks into three groups: 


Regex This category contains formulae with dominating regular membership and 
length constraints. It consists of AutomatArk [13], Denghang, StringFuzz {15}, 
and Sygus-qgen benchmark sets. We excluded 1,568 formulae from StringFuzz 
that require support of the to int predicate. 

Equations The formulae in this category consist mostly of word equations with length 
constraints and a small amount of other predicates. It contains Kaluza [40[27], Ke- 
pler [25], Norn B, Slent [44], Slog [45], Webapp, and Woorpje benchmark 
sets. We excluded 414 formulae from Webapp that require support of replace.all, 
replace re, and replace re all predicates. 


3 Latest commit 70d01e2d2, run with -portfolio=strings option. 
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Table 1: Results of experiments on all benchmark sets. For each tool and benchmark set (as well 
as whole groups under X), we give the number of unsolved instances. Results for tools with the 
highest number of solved instances are in bold. Numbers with * contain also incorrect results. 


Regex Equations Predicates-small 

Aut Den StrFuzz Syg X Kal Kep Norn Slent Slog Web Woo X  Strlnt Leet StrSm X PyEx 
Included 15,995 999 10,050 343 27,387 19,432 587 1,027 1,128 1,976 267 809 25,226 11,669 2,652 1,670 15,991 23,845 
Unsupported 0 0 1568 0 1,568 0 0 0 0 0 414 O 414 5,299 0 210 5,509 0 
cvc5 94 18 1037 0 1149 0240 85 22 0 40 54 441 0 0 4 4 34 
Z3 113 118 340 0 3571 164313 124 74 71 61 25 832 4 0 32 36 1071 
Z3sTR4 60 4 27 0 9] 174254 73 73 16 62 78 730 5 4 37 46 570 
OSTRICH 55 15 220 0 299 288 387 1 130 7 65 53 931 37 26 *106 *169 12,290 
Z3sTR3RE 66 27  *143 1 *237 "144311 133 87 55*104"118 *952 64 192 *179 *435 17,764 


Z3-NoopLERP" 86 1 *1,014 0"7/1/01 508 575 0 6 0 *3 256 *1,348 40 29 *493 *562 *13,362 


Predicates-small Although Z3-NoopDLer focuses mainly on word equations with length 
and regular constraints, the evaluation includes also a group consisting of smaller 
formulae that use string predicates such as substr, at, contains, etc. It is formed 
from FullStrint, LeetCode, and StrSmallRw benchmark sets. We removed 5,509 
formulae containing the to/from int predicates from FullStrint and StrSmallRw. 


We also consider the PyEx benchmark, which we do not put into any of these 
groups, as it contains large formulae with complex predicates (substr, contains, 
etc.). We note that we omit the small Transducer+ benchmark because it contains 
exclusively formulae with replace all. 


Results. We show the number of unsolved instances Table 2: Average run times (in sec- 
for each benchmark and tool (as well as whole  onds) of solved instances and their 
groups) in Table |1| Some tools gave incorrect re- standard deviations. 

sults (determined by comparing to the output of cvc5 


Reg Eq Pred 
and Z3) for some benchmarks. Usually, this was less avg std avg std avg std 
than 10 instances, except for Z3srR3RE on String- 

. cvc5 1.17 8.51 0.11 2.15 0.03 0.15 
Fuzz and StrSmallRw (50 and 12 incorrect results re- z3 1.92 9.71 0.18 2.83 0.04 0.42 
Z3sTR4 0.35 2.00 0.25 340 0.02 031 


spectively) and Z3-NoopLER?" on StrSmallRw (218 OSTRICH 429867428928 1271 1508 
incorrect results). Table 2| then shows the average ^ Z3sm3RE 0.31 3.28 0.13 2.72 0.01 0.08 
run times and their standard deviations for solved — 7^ Nee" 027286012 259 009 169 
instances for each category and tool. 

The results show that Z3-NoopLer outperforms other tools on the Regex group (in 
particular on Denghang, StringFuzz, and Sygus-qgen) both in the number of solved 
instances and the average run time. Only on AutomatArk it cannot solve the most 
formulae (but it solves only 7 less than the winner OSTRICH, while being much faster). 

On the Equations group, Z3-NooprzR also outperforms other tools on most of the 
benchmarks. In particular on Kepler, Norn, Slent, Slog, and Webapp. On Kaluza, it is 
outperformed by other tools, but it still solves the vast majority of formulae. Z3-NoopLER 
has worse performance on Woorpje, which seems to be a synthetic benchmark generated 
to showcase the strength of a specialized algorithm (this benchmark is the reason 
for Z3-Nooprzn taking the second place in the whole group). With 0.11 s, Z3-NooDLER 
and cvc5 have the lowest average run time. 
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Fig. 2: Comparison of Z3-NooprzR with cvc5, Z3, and the virtual best solver (VBS). 
Times are in seconds, axes are logarithmic. Dashed lines represent timeouts (120 s). 
Colours distinguish groups: e Regex, e Equations, and e Predicates-small. 


The winner of Predicates-small is cvc5. In particular, on FullStrint and LeetCode 
the difference with Z3-NooDLer is equally 4 instances and on StrSmallRw the difference 
is 51 cases. The average time of Z3-NooprzR is also a bit higher, with 0.11 s for Z3- 
NooprzR compared to the 0.03 s for cvc5. Similarly, Z3-NoopLer is outperformed by 
cvc5, Z3, and Z3srR4 on PyEx. Indeed, we have not optimized Z3-NooprzR for formulae 
with large numbers of predicates yet. The results of Z3-NoopLer could, however, be 
further improved by proper axiom saturation for predicates or lazy predicate evaluation. 


In Fig. p] we show scatter plots compar- Table 3: Evaluating solver contribution to 
ing running time of Z3-NooprzR with cvc5, a portfolio. Times are in seconds. 
Z3, and virtual best solver (VBS; a solver Regex Equations 
that takes the best result from all tools other EC Unsolved Time 
than Z3-NoopLER) on all three benchmark — vss. 73 osos 


2,914 131 6,830 


1 

groups. The plots show that Z3-NooDLER  VBS*-cvc$ 1 549 145 1401 
: VBS*- Z3 1 430 29 1,579 

outperforms the competitors on a vast num- — vBs* zasm4 1 47 19 1416 
ber of instances, in many cases being comple- ^ VBS'- OSTRICH d» m 2r 115270 
. Í : VBS*- Z3stR3RE 1 510 20 1,307 

mentary to them. To validate this claim, we — cvcs + Za +.Z3-Noopter 1 608 22 1,471 
cvc5 + Zà 27827916 303 2,805 


also checked how different solvers contribute 
to a portfolio. That is, we took the VBS in- 
cluding Z3-NoopLer (VBS*) and then checked how well the portfolio works without 
each of the solvers. Table B|shows the results on the Regex and Equations groups (we 
omit Predicates-small, where Z3-NooprzR does not help the portfolio). The results 
show that on the two groups, Z3-NooprzR is the most valuable solver in the portfolio. 
We also include results on the small portfolio of Z3 and cvc5 (with and without Z3- 
NoopLER) showing that, on the two groups, using just these three solvers is almost as 
good as using the whole portfolio of all solvers. 


Comparing with the older version Z3-NoopLer?” from [20], we can see that there 
is a significant improvement in most benchmarks, most significantly in AutomatArk, 
StringFuzz, Kepler, StrSmallRw, and Kaluza. We note that adding more complicated 
algorithm selection strategies significantly improved the overall performance of Z3- 
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NOODLER, but, on the other hand, decreased the performance on Kaluza (cf. ([20]). Better 
results in AutomatArk and StringFuzz stem from the improvements in Mata and from 
heuristics tailored for regular expressions handling. Including Nielsen’s algorithm 
has the largest impact on the Kepler benchmark. The improvement on predicate-intensive 
benchmarks is caused by optimizations in axiom saturation for predicates. The older 
version also had multiple bugs that have been fixed in the current version. 
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Abstract. We present TaSSAT, a powerful local search SAT solver that 
effectively solves hard combinatorial problems. Its unique approach of 
transferring clause weights in local minima enhances its efficiency in 
solving problem instances. Since it is implemented on top of YalSAT, 
TaSSAT benefits from practical techniques such as restart strategies and 
thread parallelization. Our implementation includes a parallel version 
that shares data structures across threads, leading to a significant re- 
duction in memory usage. Our experiments demonstrate that TaSSAT 
outperforms similar solvers on a vast set of SAT competition bench- 
marks. Notably, with the parallel configuration of TaSSAT, we improve 
lower bounds for several van der Waerden numbers. 


Keywords: Local Search for SAT - Weight Transfer - Memory Efficiency 


1 Introduction 


The SAT problem asks if there exists a satisfying truth assignment for a given 
formula in propositional logic. SAT is known to be intractable [11], but modern 
SAT solvers, particularly conflict-driven clause learning (CDCL) solvers, have 
made significant progress in solving large formulas from various application do- 
mains. When it comes to combinatorial problems, stochastic local search (SLS) 
solvers are often more effective than CDCL. Because SLS and CDCL solvers 
have complementary strengths, some SAT solvers like Kissat [7] and CryptoMin- 
iSAT [17] combine SLS and CDCL techniques, and SLS methods play a key role 
in shaping the capabilities of modern SAT solvers. 

SLS solvers explore truth assignments by flipping the truth value of individual 
variables until a solution is found or until timeout. The solver generally tries to 
flip variables that will minimize the number of falsified clauses. When a solver 
determines that no variable flip will lead to an improvement according to some 
heuristic or metric, it has reached a local minimum. 

'To escape local minima, the solver can either make random flips or adjust its 
internal state until improvement is possible. Despite being an effective family of 
algorithms for escaping local minima, Dynamic Local Search (DLS) has attracted 
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limited attention in the recent years. DLS algorithms assign weights to clauses, 
search to find a solution by minimizing the total amount of weight held by 
falsified clauses, and adjust these weights in local minima as a means of escaping 
them. 

The tool we present in this paper is ultimately based on DDFW [16] (di- 
vide and distribute fixed weights), a DLS algorithm that dynamically transfers 
weight from satisfied to falsified clauses along neighborhood relationships in local 
minima. DDFW is remarkably effective at solving hard combinatorial problems, 
such as matrix multiplication [14], graph coloring [13], edge matching [12], the 
coloring of the Pythagorean triples [15], and finding bounds for van der Waer- 
den numbers [3]. Notably, DDFW solves satisfiable instances of the Pythagorean 
triples problem in under a minute, whereas CDCL solvers take CPU years. 

In this paper, we introduce Transfer and Share SAT (TaSSAT), a novel par- 
allel SLS solver. TaSSAT implements LiWeT, a simplification of the algorithm 
from our recent work [10] modifying DDFW. Our implementation of TaSSAT is 
built on top of a leading SLS solver YalSAT [5], and it adds two new features. 
First, it incorporates the weight-transfer methods from LiWeT, leading to more 
efficient solving. Specifically, a new weight-transfer parameter allows TaSSAT to 
shift more clause weight in local minima, enhancing its adaptability during the 
search. Second, TaSSAT's parallel mode shares data structures among threads 
to reduce its memory footprint by up to 80%. 

Our results show that TaSSAT substantially outperforms YalSAT on an ex- 
tensive benchmark set of 5355 anniversary instances from the 2022 SAT Compe- 
tition. Further, TaSSAT's parallel version improves the lower bounds for nine van 
der Waerden numbers, surpassing prior work by Ahmed et al. [3] that used 29 
algorithms (including DDFW) and extensive parallelization. Our results demon- 
strate the clear algorithmic and practical improvements of TaSSAT. 


2 Preliminaries 


A SAT formula in conjunctive normal form (CNF) is a conjunction of clauses, 
each of which is a disjunction of literals (Boolean variables or their negations). 
A clause C is satisfied by a truth assignment a if o satisfies at least one of its 
literals, and is otherwise falsified. A formula F is satisfied by o when all of its 
clauses are. Clauses C and D are neighbors if they share a common literal. 

In DLS, clauses are assigned weights, denoted as W : C + Rso, representing 
the cost of leaving a clause falsified. The total weight of the falsified clauses 
is the falsified weight. Variables that reduce the falsified weight when flipped 
are called weight-reducing variables, while those that do not impact the falsified 
weight when flipped are called sideways variables. 

DDFW starts with a random initial truth assignment and sets all clause 
weights to parameter wo (wo = 8 in the original paper [16]). It then flips weight- 
reducing variables until none remain. Upon reaching a local minimum, DDFW 
randomly chooses between making a sideways flip (if possible, and with a 15% 
chance) or entering the weight transfer phase. During weight transfer, each falsi- 
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Fig. 1: PAR-2 scores for parameter searches on initpct, basepct, and currpct. 
'The plots are oriented to best show the performance trends, so the axes vary. 


fied clause receives a fixed weight from a maximum-weight satisfied neighbor Cs 
(except for 196 of the time, when a random satisfied clause is chosen instead). The 
amount of weight transferred from Cs depends on its weight: if W(Cs) > wo, 
then a weight of 2 is taken; otherwise, a weight of 1 is taken. 


3 LiWeT: The Linear Weight Transfer Algorithm 


TaSSAT takes ideas from DDFW and distills them into an algorithm called 
LiWeT (Linear Weight Transfer), which is a simplification of our prior work [10]. 
LiWeT uses a novel linear weight transfer rule to determine how much weight 
to move in local minima. The rule takes three parameters: currpct, a multiplier 
on the current clause's weight; basepct, a multiplier on the initial weight wo; 
and initpct, a multiplier for clauses with exactly wo weight. For most clauses 
Cs, the amount of weight that is transferred is currpct - W (C's) + basepct - wo. 
For clauses with W(Cs) = wo, the amount taken is initpct wo. As a result, 
initpct controls how much weight is initially taken from a clause. 

'The weight transfer rule offers two key advantages. First, the use of floating- 
point parameters rapidly establishes distinct weights for clauses, eliminating the 
need for tie-breaking near local minima and, consequently, explicit sideways flips. 
Second, the initpct parameter enables LiWeT to release a larger proportion of 
the total clause weight, enhancing its adaptability to challenging formulas. In 
DDFW and LiWeT, maximum-weight neighbors are selected for each falsified 
clause within local minima. Clauses with weights less than wo are unlikely to 
contribute more weight, artificially reducing the total amount of weight LiWeT 
can move around. The initpct parameter prevents this from happening. 

LiWeT differs from DDFW in one other respect: in local minima, it increases 
the probability of choosing a randomly satisfied clause, rather than a maximum- 
weight neighbor, to 1096. We found that this improves overall performance. 

Algorithm 1 shows LiWeT’s pseudocode. 
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Algorithm 1: The LiWeT algorithm 
Input: CNF formula F, wo, initpct, basepct, currpct 
Output: Satisfiability of F 

i W(C)< wo for all C € F 

2 a + random truth assignment on the variables in F 

3 for 1 to MAXFLIPS do 

4 if a satisfies F then return “SAT” 


5 else 
6 if a weight reducing variable is available then 
7 flip the variable that reduces the falsified weight the most 
8 else 
9 foreach clause C € F falsified under a do 
10 Cs < select a satisfied clause 
11 if W(Cs) 2 wo then w + initpct- wo 
12 else w < currpct - W (Cs) + basepct - wo 
13 transfer w from C's to C 


14 return “No SAT” 


To determine the effect of the three parameters, we conducted parameter 
searches across them. We ranged basepct c [0,0.3], currpct € [0,0.2], and 
initpct c [0, 1.0] with increments of 0.1, 0.05 and 0.2, respectively. Our searches 
were done on a combined 168 instances from the 2019 SAT Race and the 2021 
and 2022 SAT competitions, each with a 900-second timeout. We picked these 
instances because they were solved by previous versions of LiWeT and DDFW, 
and thus were less likely to result in timeout. 


Figure 1 shows the PAR-2 scores for two parameter searches, where a lower 
score indicates better performance.! The left plot shows that TaSSAT performs 
better with higher values of both basepct and currpct when initpct = 1. 
The optimal configuration is (basepct, currpct) = (0.175,0.075). The right 
plot shows that LiWeT performs best when initpct = 1 for any basepct value 
when currpct = 0. This suggests that taking all weight from satisfied clauses 
early in the search is crucial for better performance. We ran all subsequent 
TaSSAT experiments with (initpct, basepct, currpct) = (1, 0.175, 0.075). 

We conclude this section by outlining the distinctions between the algorithm 
presented in [10] and LiWeT, underscoring the simplifications introduced in the 
latter compared to the former. Compared to the algorithm from our previous 
work [10], LiWeT has two fewer parameters. Previously, the algorithm used two 
pairs of (a,c) parameters to transfer ax W (Cs) +c weight from satisfied clauses 
Cs in local minima. One pair of (a, c) values was used when W (C's) > wo, and 
the other for when W(Cg) = wo. In LiWeT, we replaced the second pair with 
initpct. Then based on the observation in the right plot of Figure 1, we set 


! The PAR-2 score is defined as the average solving time, with twice the timeout as 
the time for unsolved instances. 
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initpct to 1 for performance reasons. This adjustment eliminates initpct from 
line 11 of Algorithm 1, transforming it into a two-parameter algorithm. 

Another simplification was the the removal of sideways variable flips from 
LiWeT. DDFW and previous versions of our algorithm would flip sideways vari- 
ables, but we found that they rarely occured with floating-point weights, and 
refusing to flip them didn't affect performance. Notably, these simplifications 
enhance the algorithmic power of LiWeT over the previous algorithm, which we 
demonstrate in section 5. 


4 Implementation of TaSSAT and PaSSAT 


We implemented TaSSAT on top of YalSAT [6], a state-of-the-art SLS solver that 
implements the ProbSAT algorithm [4]. As a result, our implementation benefits 
from the practical techniques present in YalSAT, including restart techniques. 
Our TaSSAT implementation? includes a parallel version, called PaSSAT, that 
improves the memory management of the parallel version of YalSAT. 

Because LiWeT is computationally expensive when there are a higher number 
of falsified clauses, TaSSAT has an optional mode to run ProbSAT until the 
number of falsified clauses drops beneath a dynamically computed threshold 
based on the formula's size, at which point it resumes LiWeT. By default, we 
ran TaSSAT with this option disabled in our experiments, but we enabled it for 
the van der Waerden experiments. 

We also improve on the parallel features in YalSAT. The main issue in the 
parallel version of YalSAT was that the formula data structures were not shared. 
As a result, each thread had to independently parse, store, and simplify the input 
formula, resulting in redundant computation and a bloated memory footprint. 
We solved this problem in PaSSAT by nominating a primary thread to parse and 
simplify the formula and to allocate the core data structures. Once the primary 
thread finishes, it hands solving off to the secondary threads, which can then 
jointly refer to the shared data structures. 


5 Evaluation 


We now present our experimental results? of TaSSAT against similar algorithms. 
Our baseline solvers are the original YalSAT (YalSAT-Prob); our DDFW-inspired, 
YalSAT-based solver from previous work [10] (YalSAT-Lin); a YalSAT-based im- 
plementation of DDFW (YalSAT-DDFW); and the UBCSAT implementation of 
DDFW (UBCSAT-DDFW). We include two DDFW implementations to check 
that the YalSAT version performs similarly to the UBCSAT one, despite being 
implemented with a different base solver. 

We ran these four solvers on two benchmark sets: a set of 5355 instances 
from the 2022 SAT Competition's anniversary track (the anni set) [1] cover- 
ing instances from the previous 20 years of competition, and a set of nine van 


? TaSSAT source code is available at https://github.com/solimul/tassat. 
3 Details are available at https://github.com/solimul/TACAS-24-solve, details. 
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Fig. 2: Performance profiles for solver modifications on the anni benchmark set 
show that TaSSAT significantly outperforms the others. Since all solvers can 
quickly solve 600 instances, we start the y-axis at 600 to improve readability. 


der Waerden number instances.* For reproducibility, we set all randomization 
seeds to 0. For the anni instances, we ran TaSSAT and our baseline solvers in 
the StarExec Cluster [2] with a 5000-second timeout. For the van der Waeren 
instances, we ran the parallel version of TaSSAT with and without the ProbSAT- 
LiWeT option with a 48-hour timeout on the Bridges-2 cluster [8] with AMD 
EPYC 7742 CPUs (128 cores, 512GB RAM). 

Figure 2 illustrates our results for the anni dataset. TaSSAT performed the 
best by solving 1040 problem instances, surpassing YalSAT-Lin, UBCSAT-DDFW, 
YalSAT-DDFW, and YalSAT-Prob with 969, 874, 859, and 857 solved instances, 
respectively. In particular, TaSSAT solved 71 more instances than YalSAT-Lin, 
the solver from our previous work, showing that our algorithmic changes are, 
in fact, improvements. The slight difference in solve counts between UBCSAT- 
DDFW and YalSAT-DDFW (874 vs. 859) can be attributed to random noise. 

Notably, TaSSAT exclusively solved 12 instances that no 2022 SAT Compe- 
tition solver could. However, YalSAT-Prob, YalSAT-Lin, UBCSAT-DDFW, and 
YalSAT-DDFW solved 73, 42, 40, and 38 anni instances, respectively, that TaS- 
SAT could not. 

We also present new lower bounds for van der Waerden numbers by running 
PaSSAT. The van der Waerden number w(2; 3, t) is the smallest natural number n 


^ Available at https: //github.com/solimul/vdw9. 
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Table 1: Lower bounds for van der Waerden numbers w(2; 3, t). 


t 31 32 33 34 35 36 37 38 39 


Ahmed et al. [3] 930 1006 1063 1143 1204 1257 1338 1378 1418 
Our work 953 1011 1071 1145 1208 1260 1341 1380 1419 


where for any partition of (1,...,n] into Po and Pi, either P) contains a 3- 
term arithmetic progression or P, contains a t-term arithmetic progression. In 
Table 1, we present in the top row previously-known lower bounds for w(2; 3, t) 
for 31 < t < 39. 

The best lower bounds are obtained when PaSSAT leverages TaSSAT with the 
activation of the ProbSAT-LiWeT toggle and integrates YalSAT-style restarts. 
This configuration solves all 9 vdw benchmarks, pushing the lower bounds of 
these 9 numbers to values that are highlighted in the bottom row of Table 1. In 
contrast, using the default TaSSAT configuration, PaSSAT solves 7 vdw bench- 
marks, establishing same lower bounds for all the numbers shown in the bottom 
row of Table 1, except for w(2; 3, 32) and w(2; 3, 37). Hence, this version enhances 
the lower bounds for w(2;3,32) and w(2;3,37) to 1010 and 1340, respectively, 
just 1 short of their best-evaluated lower bounds. The performance of TaSSAT- 
Prob-LiWeT compared to TaSSAT-LiWeT is evident in their respective average 
PAR-2 scores, with values of 31,943 and 91,744. 


Putting these results into perspective, Ahmed et al. [3] were unable to solve 
any of these vdw instances, despite employing 29 algorithms and extensive par- 
allelization. Notably, the best result attained by Ahmed et al. using only SLS 
methods for w(2;3,31) was 919. We improved this bound to 953 These results 
emphasize the unique algorithmic strengths of our solver. 

In addition to improved solving, PaSSAT achieves significant memory re- 
duction compared to our previous parallel solver [10]. Across the seven vdw 
benchmarks solved by both PaSSAT and the parallel solver, the average memory 
reduction is substantial, decreasing from 3.2 GB to 686.17 MB, a nearly 8096 
reduction. The reduction held even for the largest problem instance (t — 39), 
where the memory footprint decreased by nearly 8096, from 4.42 GB to 966 MB. 


Code and Data Availability Statement 


The code and data that support the contributions of this work are openly avail- 
able in the “Artifact for TaSSAT: A Stochastic Local Search Solver for SAT" 
at https://zenodo.org/records/10042124 [9]. The authors confirm that the 
data supporting the findings of this study are available within the article and 
the artifact. 
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Abstract. State-of-the-art model-checking algorithms like IC3/PDR are 
based on uni-directional modular SAT solving for finding and/or blocking 
counterexamples. Modular SAT-solvers divide a SAT-query into multiple 
sub-queries, each solved by a separate SAT-solver (called a module), and 
propagate information (lemmas, proof obligations, blocked clauses, etc.) 
between modules. While modular solving is key to IC3/PDR, it is obvi- 
ously not as effective as monolithic solving, especially when individual 
sub-queries are harder to solve than the combined query. This is par- 
tially addressed in SAT modulo SAT (SMS) by propagating unit literals 
back and forth between the modules and using information from one 
module to simplify the sub-query in another module as soon as possible 
(i.e., before the satisfiability of any sub-query is established). However, 
bi-directionality of SMS is limited because of the strict order between de- 
cisions and propagation — only one module is allowed to make decisions, 
until its sub-query is SAT. In this paper, we propose a generalization 
of SMS, called SPECSMS, that speculates decisions between modules. 
This makes it bi-directional — decisions are made in multiple modules, 
and learned clauses are exchanged in both directions. We further extend 
DRUP proofs and interpolation, these are useful in model checking, to 
SPECSMS. We have implemented SPECSMS in Z3 and empirically vali- 
date it on a series of benchmarks that are provably hard for SMS. 


1 Introduction 


IC3/PDR [3] is an efficient SAT-based Model Checking algorithm. Among many 
other innovations in IC3/PDR is the concept of a modular SAT-solver that di- 
vides a formula into multiple frames and each frame is solved by an individual 
SAT solver. The solvers communicate by exchanging proof obligations (i.e., sat- 
isfying assignments) and lemmas (i.e., learned clauses). 

While modular reasoning in IC3/PDR is very efficient for a Model Checker, 
it is not as efficient as a classical monolithic SAT-solver. This is not surprising 
since modularity restricts the solver to colorable refutations [11], which are, in the 
worst case, exponentially bigger than unrestricted refutations. On the positive 
side, IC3/PDR's modular SAT-solving makes interpolation trivial, and enables 
(9 The Author(s) 2024 
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generalizations of proof obligations and inductive generalization of lemmas — 
both are key to the success of IC3/PDR. 

This motivates the study of modular SAT-solving, initiated by SMS [1]. Our 
strategic vision is that our study will contribute to improvements in IC3/PDR. 
However, in this paper, we focus on modular SAT-solving in isolation. 

In modular SAT-solving, multiple solvers interact to check satisfiability of a 
partitioned CNF formula, where each part of the formula is solved by one of the 
solvers. In this paper, for simplicity, we consider the case of two solvers (Ss, Sm) 
checking satisfiability of a formula pair ($,, Pm). Sm is a main solver and S; is 
a secondary solver. In the notation, the solvers are written right-to-left to align 
with IC3/PDR, where the main solver is used for frame 1 and the secondary 
solver is used for frame 0. 

When viewed as a modular SAT-solver, IC3/PDR is uni-directional. First, 
Sm finds a satisfying assignment o to m and only then, 5$, extends ø to an 
assignment for ®,. Learned clauses, called lemmas in IC3/PDR, are only shared 
(or copied) from the secondary solver $, to the main solver Sin. 

SAT Modulo SAT (SMS) [1] is à modular SAT-solver that extends IC3/PDR 
by allowing inter-modular unit propagation and conflict analysis: whenever an 
interface literal is placed on a trail of any solver, it is shared with the other solver 
and both solvers run unit propagation, exchanging unit literals. T'his makes mod- 
ular SAT-solving in SMS bi-directional as information flows in both directions 
between the solvers. Bi-directional reasoning can simplify proofs, but it signifi- 
cantly complicates conflict analysis. To manage conflict analysis, SMS does not 
allow the secondary solver S, to make any decisions before the main solver Sm is 
able to find a complete assignment to its clauses. As a result, learned clauses are 
either local to each solver, or flow only from Ss to Sm, restricting the structure 
of refutations similarly to IC3/PDR. 

Both IC3/PDR and SMS require Sm to find a complete satisfying assignment 
to Pm before the solving is continued in Ss. This is problematic since Pm might 
be hard to satisfy, causing them to get stuck in m, even if considering both 
formulas together quickly reveals the (un)satisfiability of ($,, Bm). 

In this paper, we introduce SPECSMS — a modular SAT-solver that em- 
ploys a truly bi-directional reasoning. SPECSMS builds on SMS, while facilitat- 
ing deeper communication between the modules by (1) allowing learnt clauses to 
flow in both directions, and (2) letting the two solvers interleave their decisions. 
The key challenge is in the adaptation of conflict analysis to properly handle the 
case of a conflict that depends on decisions over local variables of both solvers. 
Such a conflict cannot be explained to either one of the solvers using only in- 
terface clauses (i.e., clauses over interface variables). It may, therefore, require 
backtracking the search without learning any conflict clauses. To address this 
challenge, SPECSMS uses speculation, which tames decisions of the secondary 
solver that are interleaved with decisions of the main solver. If the secondary 
solver satisfies all of its clauses during speculation, a validation phase is em- 
ployed, where the main solver attempts to extend the assignment to satisfy its 
unassigned clauses. If speculation leads to a conflict which depends on local deci- 
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sions of both solvers, refinement is employed to resolve the conflict. Refinement 
ensures progress even if no conflict clause can be learnt. With these ingredients, 
we show that SPECSMS is sound and complete (i.e., always terminates). 

'To certify SPECSMS's result when it determines that a formula is unsatisfi- 
able, we extract a modular clausal proof from its execution. To this end, we ex- 
tend DRUP proofs [12] to account for modular reasoning, and devise a procedure 
for trimming modular proofs. Such proofs are applicable both to SPECSMS and 
to SMS. Finally, we propose an interpolation algorithm that extracts an inter- 
polant [4] from a modular proof. Since clauses are propagated between the solvers 
in both directions, the extracted interpolants have the shape A;(C; => cls;), 
where C; are conjunctions of clauses and each cls; is a clause. 

Original SMS is implemented on top of MiniSAT. For this paper, we im- 
plemented both SMS and sPECSMS in Z3 [5], using the extendable SAT-solver 
interface of Z3. Thanks to its bi-directional reasoning, SPECSMS is able to ef- 
ficiently solve both sat and unsat formulas that are provably hard for existing 
modular SAT-solvers, provided that speculation is performed at the right time. 
We describe a simple heuristic to decide when to speculate. 

In summary, we make the following contributions: (i) the SPECSMS algo- 
rithm that leverages bi-directional modular reasoning (Sec. 3); (ii) modular 
DRUP proofs for SPECSMS (Sec. 4.1); (iii) proof-based interpolation algorithm; 
(iv) heuristics to guide speculation (Sec. 5); and (v) implementation and valida- 
tion (Sec. 6). 


2 Motivating examples 


In this section, we discuss two examples in which both IC3/PDR-style uni- 
directional reasoning and SMS-style shallow bi-directional reasoning are ineffec- 
tive. The examples illustrate why existing modular reasoning gets stuck. To bet- 
ter convey our intuition, we present our problems at word level using bit-vector 
variables directly, without explicitly converting them to propositional variables. 
Example 1. Consider the following modular sat query: (Yin, YsHA-1), Where Yin 5 
(in = ini) V (in = ing), in is a 512-bit vector, ini, ing are 512-bit values, 
ysHa-1 Ê (SHA-1cire(in) = SHA-1in, ), SHA-1cire(in) is a circuit that computes 
SHA-1 of in, and SHA-1;,, is the 20 byte SHA-1 message digest of iny. 

Checking the satisfiability of yin ^ ysHa-1 is easy because it contains both 
the output and the input of the SHA-1 circuit. However, existing modular SAT- 
solvers attempt to solve the problem starting by finding a complete satisfying 
assignment to YsHa-1- This is essentially the problem of inverting the SHA-1 
function, which is known to be very hard for a SAT-solver. The improvements in 
SMS allow unit propagation between the two modules. However, this does not 
help since there are no unit clauses in Yin. 

On the other hand, SPECSMS proceeds as follows: (1) when checking satisfia- 
bility of esua.1, it decides to speculate, (2) it starts checking satisfiability of yin, 
branches on variables in, finds an assignment o to in and unit propagates c to 
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CSHA-1; (3) if there is a conflict in ogg. 1, it learns the conflict clause in Æ ing, 
and (4) it terminates with a satisfying assignment in = ini. Speculation in step 
(1) is what differentiates SPECSMS from IC3/PDR and SMS. The specifics of 
when exactly SPECSMS speculates is guided by a heuristic that is explained in 
Sec. 5. 


Example 2. Speculation is desirable for unsatisfiable formulas as well. Consider 
the modular sat query (4,9), where o4 £ (a < 0 > 2)A(a>0>2)A 
PHPi, and y- £ (b < 0 > 72) A (b > 0 => 72) A PHP2, Here, a and 
b are 32-wide bitvectors and local to the respective modules. PH P454 encodes 
the problem of fitting 32 pigeons into 31 holes and PH Pl]; and PH P2, denote 
a partitioning of PH P35 into 2 problems such that both formulas contain all 
variables. The modular problem (y+, y_) is unsatisfiable, x and PH Pi, being 
two possible interpolants. IC3/PDR and SMS only find the second interpolant. 
This is because, all satisfying assignments to y_ immediately produce a conflict 
in PH Pi, part of y4, without having to make any decisions. However, learning 
an interpolant containing x requires searching (i.e., deciding) in both p} and 
p-. SPECSMS solves this problem by speculating right after deciding on all b 
variables. During speculation, the secondary solver hits a conflict on x once it 
tries to find an assignment to a variables. Note here that speculating after finding 
assignments to b variables and before finding an assignment to PH P3, is crucial 
for SPECSMS to find the small interpolant. 


'These examples highlight the need to speculate while doing modular rea- 
soning. Even though speculation by itself is quite powerful, to make SPECSMS 
effective in practice, we need good heuristics to decide when to enter speculation. 
We discuss some simple heuristics in Sec. 5. 


3 Speculative SAT Modulo SAT 


'This section presents SPECSMS — a modular bi-directional SAT algorithm. For 
simplicity, we restrict our attention to the case of two modules. However, the 
algorithm easily generalizes to any sequence of modules. 


3.1 Sat Modulo Sat 


We assume that the reader has some familiarity with internals of a MiniSAT- 
like SAT solver [6] and with SMS [1]. We give a brief background on SMS, 
highlighting some of the key aspects. SMS decides satisfiability of a partitioned 
CNF formula (Ps, Pm) with a set of shared interface variables I. It uses two 
modules (Ss, Sm}, where Sm is a main module used to solve Pm, and S, is a 
secondary module to solve s. Each module is a SAT solver (with a slightly 
extended interface, as described in this section). We refer to them as modules 
or solvers, interchangeably. Each solver has its own clause database (initialized 
with 4; for i € {m,s}), and a trail of literals, just as a regular SAT solver. The 
solvers keep their decision levels in sync. Whenever a decision is made in one 
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solver, the decision level of the other solver is incremented as well (adding a null 
literal to its trail if necessary). Whenever one solver back-jumps to level i, the 
other solver back-jumps to level i as well. Assignments to interface variables are 
shared between the solvers: whenever such a literal is added to the trail of one 
solver (either as a decision or due to propagation), it is also added to the trail 
of the other solver. SMS requires that Ss does not make any decisions, until Sm 
finds a satisfying assignment to its clauses. 


Inter-modular propagation and conflict analysis 'The two key features of SMS 
are inter-modular unit propagation (called PROPAGATEALL in [1]) and the cor- 
responding inter-modular conflict analysis. In PROPAGATEALL, whenever an in- 
terface literal is added to the trail of one solver, it is added to the trail of the 
other, and both solvers run unit propagation. Whenever a unit literal £ is copied 
from the trail of one solver to the other, the reason for £ in the destination solver 
is marked using a marker ext. This indicates that the justification for the unit is 
external to the destination solver?. Propagation continues until either there are 
no more units to propagate or one of the solvers hits a conflict. 

Conflict analysis in SMS is extended to account for units with no reason 
clauses. If such a literal £ is used in conflict analysis, its reason is obtained by 
using AnalyzeFinal(£) on the other solver to compute a clause (s = £) over the 
interface literals. This clause is copied to the requesting solver and is used as the 
missing reason. Multiple such clauses can be copied (or learned) during analysis 
of a single conflict clause — one clause for each literal in the conflict that is 
assigned by the other solver. 

In SMS, it is crucial that AnalyzeFinal(¢) always succeeds to generate a reason 
clause over the interface variables. This is ensured by only calling AnalyzeFinal(4) 
in the S, solver on literals that were added to the trail when Ss was not yet 
making decisions. This can happen in one of two scenarios: either Sm hits a 
conflict due to literals propagated from Ss, in which case AnalyzeFinal is invoked 
in S, on each literal marked ext in Sm that is involved in the conflict resolution 
to obtain its reason; or S; hits a conflict during unit propagation, in which case 
it invokes AnalyzeFinal to obtain a conflict clause over the interface variables 
that blocks the partial assignment of Sm. In both cases, new reason clauses are 
always copied from Ss to Sm. We refer the reader to [1] for the pseudo-code of 
the above inter-modular procedures for details. 


3.2 Speculative Sat Modulo Sat 


SPECSMS extends SMS [1] by a combination of speculation, refinement, and vali- 
dation. During the search in the main solver Sm, SPECSMS non-deterministically 
speculates by allowing the secondary solver S, to extend the current partial as- 
signment of Pm to a satisfying assignment of Ps. If S, is unsuccessful (i.e., hits 
a conflict), and the conflict depends on a combination of a local decision of Sm 


3 This is similar to theory propagation in SMT solvers. 
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SMS and sPECSMS only SPECSMS 


Fig. 1: State transitions of SPECSMS. A state (P, D?) means that the secondary 
solver $, is in propagate mode and the main solver S, is in decide mode. Each 
edge is guarded with a condition. The condition Sm : SAT means that Sm found 
a full satisfying assignment to Pm. The condition Sm : C@< means that Sm hit 
a conflict at a decision level below j. The four states in yellow corresponds to 
SMS; two states in green are unique to SPECSMS. 


with some decision of $,, then the search reverts to Sm and its partial assign- 
ment is refined by forcing Sm to decide on an interface literal from the conflict. 
On the other hand, if S, is successful, solving switches to the main solver Sm 
that validates the current partial assignment by extending it to all of its clauses. 
This either succeeds (meaning, (Ps, Pm) is sat), or fails and another refinement 
is initiated. Note that the two sub-cases where 5, is unsuccessful but the reason 
for the conflict is either local to S, or local to Sm are handled as in SMS. 


Search modes SPECSMS controls the behavior of the solvers and their interaction 
through search modes. Each solver can be in one of the following search modes: 
Decide, Propagate, and Finished. In Decide, written D, the solver treats all 
decisions below level į as assumptions and is allowed to both make decisions and 
do unit propagation. In Propagate, written P, the solver makes no decisions, but 
does unit propagation whenever new literals are added to its trail. In Finished, 
written F, the clause database of the solver is satisfied; the solver neither makes 
decisions nor propagates unit literals. 


'The pair of search modes of both modules is called the state of SPECSMS, 
where we add a unique state called unsat for the case when the combination 
of the modules is known to be unsatisfiable. The possible states and transitions 
of SPECSMS are shown in Fig. 1. States unsat and (F, F} are two final states, 
corresponding to unsat and sat, respectively. In all other states, exactly one of 
the solvers is in a state D. We refer to this solver as active. The part of the 
transition system highlighted in yellow correspond to SMS, and the green part 
includes the states and transitions that are unique to SPECSMS. 
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Normal execution with bi-directional propagation SPECSMS starts in the state 
(P, D9), with the main solver being active. In this state, it can proceed like SMS 
by staying in the yellow region of Fig. 1. We call this normal execution with 
bi-directional propagation, since (only) unit propagation goes between solvers. 


Speculation What sets SPECSMS apart is speculation: at any non-deterministically 
chosen decision level i, SPECSMS can pause deciding on the main solver and ac- 
tivate the secondary solver (i.e., transition to state (D’, P)). During speculation, 
only the secondary solver makes decisions. Since the main solver does not have 
a full satisfying assignment to its clauses, the secondary solver propagates as- 
signments to the main solver and vice-versa. 

Speculation terminates when the secondary solver 5; either: (1) hits a conflict 
that cannot be resolved by inter-modular conflict analysis; (2) hits a conflict 
below decision level 4; or (3) finds a satisfying assignment to 4. 

Case (1) is most interesting, and is what makes SPECSMS differ from SMS. 
Note that a conflict clause is not resolved by inter-modular conflict analysis only 
if it depends on an external literal on the trail of S, that cannot be explained 
by an interface clause from Sm. This is possible when both Sm and Ss have 
partial assignments during speculation. So the conflict might depend on the 
local decisions of Sm. This cannot be communicated to 5S, using only interface 
variables. 


Refinement In SPECSMS, this is handled by modifying the REASON method in 
the solvers to fail (i.e., return ext) whenever AnalyzeFinal returns a non-interface 
clause. Additionally, the literal on which AnalyzeFinal failed is recorded in a 
global variable refineLit. This is shown in Alg. 1. The inter-modular conflict 
analysis is modified to exit early whenever REASON fails to produce a justifi- 
cation. At this point, SPECSMS exits speculation, returns to the initial state 
(P, D9), both solvers back-jump to decision level i at which speculation was 
initiated, and S, is forced to decide on refineLit. 

We call this transition a refinement because the partial assignment of the 
main solver Sm (which we view as an abstraction) is updated (a.k.a., refined) 
based on the information that was not available to it (namely, a conflict with a 
set of decisions in the secondary solver Ss). Since refineLit was not decided on 
in S4 prior to speculation, deciding on it is a new decision that ensures progress 
in Sm. The next speculation is possible only under strictly more decisions in Sm 
than before, or when Sm back-jumps and flips an earlier decision. 

We illustrate the refinement process on a simple example: 


Example 3. Consider the query (Ps, Bm) with: 


$.(i, j, k, z): Pmla, i, j, k): 
ZVi (3) G@ViVj (1) 
ivjvk (4) jVk (2) 


First, S4, decides a (at level 1), which causes no propagations. Then, SPECSMS 


50 Hari Govind V K et al. 


enters speculative mode, transitions to (D1, P) and starts making decisions in 
Ss. Ss decides z and calls PROPAGATEALL. Afterwards, the trails for Sm and S; 
are as follows: 


Smla 091 null @ 2 
S;|null Q 1l}z Q 2 


(1) |k 2) 


(ext)|k (ext) 


j 
j 


where x @ i denotes that literal x is decided at level i, and x (r) denotes that 
literal x is propagated using a reason clause r, or due to the other solver (if 
r = ext). A conflict is hit in S, in clause (4). Inter-modular conflict analysis 
begins. S, first asks for the reason for k, which is clause (2) in Sm. This clause is 
copied to Ss. Note that unlike SMS, clauses can move from Sm to Ss. The new 
conflict to be analyzed is (i V j V j). Now the reason for j is asked of Sm. In this 
case, Sm cannot produce a clause over shared variables to justify j, so conflict 
analysis fails with refineLit = j. This causes SPECSMS to exit speculation mode 
and move to state (P, D?) and Sm must decide variable j before speculating 
again. In this case either decision on j results in ($,, $4) being sat. 


In addition to refining when conflict analysis fails, SPECSMS also has the 
ability to refine non-deterministically. That is, at any point during speculation, 
Ss can decide to stop speculation, back-jump to the decision level from which it 
started speculation, and choose any interface literal as refineLit. 

Case (2) is similar to what happens in SMS when a conflict is detected in Ss. 
The reason for the conflict is below level i which is below the level of any decision 
of Ss. Since decision levels below 7 are treated as assumptions in Sz, calling 
AnalyzeFinal in 5, returns an interface clause c that blocks the current assignment 
in Sm. The clause c is added to Sm. The solvers back-jump to the smallest 
decision level j that makes c an asserting clause in S4. Finally, SPECSMS moves 
to (P, Doy. 


Validation Case (3), like Case (1), is unique to SPECSMS. While all clauses of 
S; are satisfied, the current assignment might not satisfy all clauses of Sm. Thus, 
SPECSMS enters validation by switching to the configuration (F, DM), where M 
is the current decision level. Thus, Sm becomes active and starts deciding and 
propagating. This continues, until one of two things happen: (3a) Sm extends 
the assignment to satisfy all of its clauses, or (3b) a conflict that cannot be re- 
solved with inter-modular conflict analysis is found. In the case (3a), SPECSMS 
transitions to (F, F} and declares that (m, s) is sat. The case (3b) is han- 
dled exactly the same as Case (1) - the literal on the trail without a reason is 
stored in refineLit, SPECSMS moves to (P, D?), backjumps to the level in which 
speculation was started, and Sm is forced to decide on refineLit. 


Theorem 1. SPECSMS terminates. If it reaches the state (F, F}, then Ps ^ Pm 
is satisfiable and the join of the trails of (Ss,Sm) is a satisfying assignment. If 
it reaches the state unsat, Ps ^ Bm is unsatisfiable. 
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Algorithm 1 The REASON method in modular SAT solvers inside SPECSMS 


1: function REASON(lit) 

2: if reason|lit] = ext then 

3 c + other.AnalyzeFinal(lit) 
4 if ducc:vélthen 

5: refineLit + lit 

6 return ext 

7 ADDCLAUSE(c) 

8 reason|lit] + c 

9 


return reason [lit] 


4 Validation and interpolation 


In this section, we augment SPECSMS with an interpolation procedure. To this 
end, we first introduce modular DRUP proofs, which are generated from SPEC- 
SMS in a natural way. We then present an algorithm for extracting an inter- 
polant from a modular trimmed DRUP proof in the spirit of [11]. 


4.1 DRUP proofs for modular SAT 


Modular DRUP proofs — a form of clausal proofs [9] — extend (monolithic) DRUP 
proofs [12]. A DRUP proof [12] is a sequences of steps, where each step either 
asserts a clause, deletes a clause, or adds a new Reverse Unit Propagation (RUP) 
clause. Given a set of clauses I’, a clause cls is an RUP for I’, written I' - yp cls, if 
cls follows from T by unit propagation [8]. For a DRUP proof 7, let ASSERTED(7) 
denote all clauses of the asserted commands in 7, then m shows that all RUP 
clauses of a follow from ASSERTED(7). If 7 contains a L clause, then 7 certifies 
ASSERTED(7) is unsat. 

A Modular DRUP proof is a sequence of clause addition and deletion steps, 
annotated with indices idx (m or s). Intuitively, steps with the same index must 
be validated together (within the same module idx), and steps with different 
indices may be checked independently. The steps are: 


1. (asserted, idz, cls) denotes that cls is asserted in idz, 

2. (rup, idx, cls) denotes adding RUP clause cls to idz, 

3. (cp(src), dst, cls) denotes copying a clause cls from src to dst, and 
4. (del, idx, cls) denotes removing clause cls from idz. 


We denote the prefix of length k of a sequence of steps m by «^. Given 
a sequence of steps 7 and a formula index idr, we use act clauses(m, idx) to 
denote the set of active clauses with index idz. Formally, 


{els | 3c; € T- 
(c; = (t, idz, cls) ^ (t = asserted V t = rup V t = cp(_))) 
^ ode, € T- k > j ^ cy = (del, idx, cls)} 
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seq step to clause 
] asserted m as, => lbi 
2 asserted m a8, > albi 
3 asserted s (sı A lai) > s2 
4 asserted s (sı A lai) > s2 
5 asserted m (s2 ^ lb2) > sa 
6 asserted m (s2 A —lb3) — sa 
7 asserted s (s3 ^ la2) > s4 
8 asserted s (s3 ^ a2) > s4 
9 asserted m s4 => lbs 
10 asserted m s4 => —lba 
11 rup m $1 
12 rup m 784 
13 rup m $2 — 83 
14cp(m) s S2 — 83 
15 rup s 83 — 84 
16 rup S $1 > 84 
17 cp(s) m S1 — 84 
18 rup m 3h 


Fig.2: An example of a modular DRUP proof. Clauses are written in human- 
readable form as implications, instead of in the DIMACS format. 


A sequence of steps T = c,...,c, is a valid modular DRUP proof iff for each 
Cc; eT: 
1. if c; = (rup, idz, cls) then act. clauses(r*, idz) Fup cls, 
2. if c; = (cp(idz), ., cls) then act_clauses(m’, idx) l- up cls, and 
3. cj4| is either (rup,m, L) or (cp(s),m, L). 


Let ASSERTED(7, idx) be the set of all asserted clauses in m with index idz. 


Theorem 2. If * is a valid modular DRUP proof, then ASSERTED(T,s) ^ 
ASSERTED(^7, m) is unsatisfiable. 


Modular DRUP proofs may be validated with either one or two solvers. To 
validate with one solver we convert the modular proof into a monolithic one 
(i.e., where the steps are asserted, rup, and del). Let M0DDRUP2DRUP bea 
procedure that given a modular DRUP proof 7, returns a DRUP proof 7’ that 
is obtained from « by (a) removing idz from all the steps; (b) removing all cp 
steps; (c) removing all del steps. Note that del steps are removed for simplicity, 
otherwise it is necessary to account for deletion of copied and non-copied clauses 
separately. 


Lemma 1. Ifr is a valid modular DRUP proof then x’ = MoDDRUP2DRUP(z) 
is a valid DRUP proof. 


Modular validation is done with two monolithic solvers working in lock step: 
(asserted, cls, idx) steps are added to the idx solver; (rup, idx, cls) steps are val- 
idated locally in solver idx using all active clauses (asserted, copied, and rup); 
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and for (cp(src), dst, cls) steps, cls is added to dst but not validated in it, and 
cls is checked to exist in the src solver. 

From now on, we consider only valid proofs. We say that a (valid) modular 
DRUP proof s is a proof of unsatisfiability of Ps ^ Pm if ASSERTED(7,s) C 4 
and ASSERTED(7, m) C Pm (inclusion here refers to the sets of clauses). 

SPECSMS produces modular DRUP proofs by logging the clauses that are 
learnt, deleted, and copied between solvers. Note that in SMS clauses may only be 
copied from Ss to Sm, but in SPECSMS they might be copied in both directions. 


Theorem 3. Let P; and Bm be two Boolean formulas s.t. Ps ^ Pm = L. SPEC- 
SMS produces a valid modular DRUP proof for unsatisfiability of Ps ^ Om. 


Algorithm 2 Trimming a modular DRUP Algorithm 3 Interpolating a mod- 
proof ular DRUP proof. 


Input: Solver instances Ss, Sm with the empty Input: Propositional formulas (o, 41) 
clause on the trail, and a modular clausal proof Input: A modular trimmed DRUP 


T = €1,..., Cn. proof 7 = c1,...,Cn of unsatisfiability 
Output: A proof 7’ s.t. all steps are core. of Bo ^ 44 
rT -( Output: An interpolant itp s.t. Bp > 
2: Ms, Mm + {L},0 > Relevant clauses itp and itp ^ $1 = L 
3: for i= n to 0 do 1: Ss, Sm < SAT. SOLVER() 
4: match ci with (type, idz, cls) 2: itp — T 
5: if cls € Mia, then continue 3: for i= 0 to n do 
6: if type — del then 4. match cj 
7: Sis. Revive(cls) 5 with (asserted, s, cls): 
8: continue 6 sup(cls) + T 
9: m'.append(c;) 7: with (cp(m),s, cls): 
10: if type = rup then 8 sup(cls) < cls 
11: — Sidg.-CHK_RUP(cls, Miaz) 9 with (rup,s, cls): 
12: else if type — cp(src) then 15: Med 
13: Sia, Delete(cls) 11: S,.CHK RUP(cls, M) 
14: Mere.add(cls) 12: sup(cls) + {sup(c) | c € M} 


15: m^ .reverse() 

16: function SOLVER::CHK. RUP(cls, M) 

17: if IsOnTrail(cls) then 

18:  UndoTrail(cls) 

19: Delete(cls) 

20: SaveTrail() 

21: Enqueue(-cls) 

22: r < Propagate() 

23:  ConflictAnalysis(r, M) > Updates M with 
conflict clauses 

24: RestoreTrail() 


13: with (cp(s), m, cls): 
14: itp «— itp ^ (sup(cls) = cls) 
15: So, ide -add(cls) 


Trimming modular DRUP proofs. A step in a modular DRUP proof s is core if 
removing it invalidates 7. Under this definition, del steps are never core since 
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removing them does not affect validation. Alg. 2 shows an algorithm to trim 
modular DRUP proofs based on backward validation. T'he input are two modular 
solvers Sm and $, in a final conflicting state, and a valid modular DRUP proof 
T = 04,...,€n. The output is a trimmed proof 7’ s.t. all steps of 7’ are core. 

We assume that the reader is familiar with MiniSAT [6] and use the following 
solver methods: Propagate, exhaustively applies unit propagation (UP) rule by 
resolving all unit clauses; ConflictAnalysis analyzes the most recent conflict and 
marks which clauses are involved in the conflict; ISOnTrail checks whether a clause 
is an antecedent of a literal on the trail; Enqueue enqueues one or more literals 
on the trail; IsDeleted, Delete, Revive check whether a clause is deleted, delete a 
clause, and add a previously deleted clause, respectively; SaveTrail, RestoreTrail 
save and restore the state of the trail. 

Alg. 2 processes the steps of the proof backwards, rolling back the states 
of the solvers. Mj;g; marks which clauses were relevant to derive clauses in the 
current suffix of the proof. While the proof is constructed through inter-modular 
reasoning, the trimming algorithm processes each of the steps in the proof com- 
pletely locally. During the backward construction of the trimmed proof, steps 
that include unmarked clauses are ignored (and, in particular, not added to the 
proof). For each (relevant) rup step, function CHK. RUP, using ConflictAnalysis, 
adds clauses to M. del steps are never added to the trimmed proof, but the clause 
is revived from the solver. For cp steps, if the clause was marked, it is marked as 
used for the solver it was copied from and the step is added to the proof. Finally, 
asserted clauses that were marked are added to the trimmed proof. Note that, 
as in [11], proofs may be trimmed in different ways, depending on the strategy 
for ConflictAnalysis. 

'The following theorem states that trimming preserves validity of the proof: 


Theorem 4. Let P; and Pm be two formulas such that Bs N Bm = L. If a isa 
modular DRUP proof produced by solvers Ss and Pm for Ps ^d, then a trimmed 
proof x’ by Alg. 2 is also a valid modular DRUP proof for Bs ^ Pm. 


Fig. 2 shows a trimmed proof after SPECSMS is executed on (po, 1) such 
that yo = ((s1Ala1) > s3)) A((si^-a1) > s2)A((s3Alaz) > s4)^((sa^-la2) > 
84) and Wy & (^81 => ibi) ^ (^81 => alb,) ^ ((s2 ^ lba) = s3) ^ ((s2 ^ alb2) => 
83) ^ (s4 > Ibs) ^ (s4 > alb3)). 


4.2 Interpolation 


Given a modular DRUP proof m of unsatisfiability of Ps ^ Pm, we give an algo- 
rithm to compute an interpolant of P; ^ m. For simplicity of the presentation, 
we assume that 7 has no deletion steps; this is the case in trimmed proofs, but 
we can also adapt the interpolation algorithm to handle deletions by keeping 
track of active clauses. 

Our interpolation algorithm relies only on the clauses copied between the 
modules. Notice that whenever a clause is copied from module i to module J, it 
is implied by all the clauses in 9; together with all the clauses that have been 
copied from module j. We refer to clauses copied from Sm to S, as backward 
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clauses and clauses copied from 5, to Sm as forward clauses. The conjunction of 
forward clauses is unsatisfiable with Sm. This is because, in the last step of m, L 
is added to Sm, either through rup or by cp L from S,. Since all the clauses in 
module m are implied by m together with forward clauses, this means that the 
conjunction of forward clauses is unsatisfiable with m. In addition, all forward 
clauses were learned in module s, with support from backward clauses. This 
means that every forward clause is implied by 4, together with the subset of the 
backward clauses used to derive it. Intuitively, we should therefore be able to 
learn an interpolant with the structure: backward clauses imply forward clauses. 

Alg. 3 describes our interpolation algorithm. It traverses a modular DRUP 
proof forward. For each clause cls learned in module s, the algorithm collects the 
set of backward clauses used to learn cls. This is stored in the sup datastucture 
— a mapping from clauses to sets of clauses. Finally, when a forward clause c is 
copied, it adds sup(c) = c to the interpolant. 


Example 4. We illustrate our algorithm using the modular DRUP proof from 
Fig. 2. On the first cp step (cp(m),s,s2 = s3), the algorithm assigns the 
sup for clause s2 = s3 as itself (line 8). The first clause learnt in module s, 
(rup,s, $3 = s4), is derived from just the clauses in module s and no backward 
clauses. Therefore, after RUP, our algorithm sets sup(s3 = s4) to T (line 12). 
The second clause learnt in module s, sı => 54, is derived from module s with the 
support of the backward clause s2 = 53. Therefore, sup(si > s4) = {s2 > 53]. 
When this clause is copied forward to module 1, the algorithm updates the in- 
terpolant to be (s2 > $3) > (sı => s4). 


Next, we formalize the correctness of the algorithm. Let Lp(x) = {cls 
(cp(m),s, cls) € 1) be the set of clauses copied from module m to s and Lr(m) = 
{cls | (cp(s), m, cls) € 7} be clauses copied from module s to m. From the validity 
of modular DRUP proofs, we have that: 


Lemma 2. For any step c; = (cp(s), m, cls) € 7, (Lpg(n*) ^ 4,) > cls and for 
any step cj = (cp(m),s, cls) € v, (Lp(n?) ^ Bm) => els. 


For any clause cls copied from one module to the other, we use the shorthand 
t(cls) to refer to the position of the copy command in the proof m. That is, f(cls) 
is the smallest k such that c; = (cp(), j, cls) € v. The following is an invariant 
in a valid modular DRUP proof: 


Lemma 3. 
Vels € Lp(a) . (On ^ (Lp (ntl) = Lp(nt(e's))) 


These properties ensure that adding Lp(mz* (69) => cls for every forward 
clause cls results in an interpolant. Alg. 3 adds (sup(cls) = cls) as an opti- 
mization. Correctness is preserved since sup(cls) is a subset of Lp(z*(*9)) that 
together with d, suffices to derive cls (formally, sup(cls) ^ s + yp cls). 


Theorem 5. Given a modular DRUP proof n for Ps \@m, itp £ (sup(c) > c | 
c € Lp(n)) is an interpolant for (Bs, Pm). 
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Proof. Since all copy steps are over interface variables, the interpolant is also 
over interface variables. By Lemma 2 (and the soundness of sup optimization), 
B; — itp. Next, we prove that (Bm ^ itp) = L. From Lemma 3, we have that for 
alle € Lr(z), (G4 ^Lr(1*9)) 2 sup(c). Therefore, (Pm ^ Lp(a) A (sup(c) > 
c) —c 


It is much simpler to extract interpolants from modular DRUP proofs then 
from arbitrary DRUP proofs. This is not surprising since the interpolants capture 
exactly the information that is exchanged between solvers. The interpolants are 
not in CNF, but can be converted to CNF after extraction. 


5 Heuristics for guiding specSMS 


Theoretically, speculation makes SPECSMS more powerful than SMS and 
IC3/PDR. However, in practice, deciding when to enter speculation has a ma- 
jor impact on the performance of SPECSMS. If the speculation is too greedy, 
SPECSMS performs poorly on examples where the main module is easy to solve. 
Similarly, if the speculation is too lazy, SPECSMS performs poorly on problems 
in which any solution to the secondary module makes the main module easy to 
solve. We illustrate this trade-off using an example. 


Example 5. Consider a modular query: (yi, (£, x, in), YsHa-1(in, x, out)), where 
x is an 512-bit vector, @ is a 160-bit vector, chks; are 512-bit vector, and the 
remaining variables are the same as in Yin and wgya-1, and 


Yin 2 SHA-1circ(x, £) ^ 
((€ = chkso ^ in = msgo) V (L = chks; ^ in = msg) V 
(L = chks ^ in = msgs) V (L = chkss ^ in = msga)) 
ysHa1 Ê (x =1V x = 4) A SHA-Leire(in, out) ^ out = sha Val 
This is an example where bi-directional search is necessary to efficiently solve the 
query. If deciding only on *ysgA.1, we encounter the hard problem of inverting 
SHA-1¢ire, if deciding in Yin, we encounter the same problem, since an assign- 
ment for x needs to be found, based on the four values for £. Therefore, neither 


immediate nor late speculation makes SPECSMS efficient on the problem. The 
ideal strategy here is to speculate after an assignment to x, to simplify yin. 


Ideally, we would like to speculate when the current modular query is too 
hard for the solver. As a proxy for hardness, we measure the number of conflicts 
the SAT solver hits. We first speculate when the main solver hits a predeter- 
mined number of conflicts. We then exponentially widen the number of conflicts 
between speculations. Exiting from speculation is just as important as entering 
speculation: the secondary solver might also get stuck in solving its module. 
Therefore, we use the same heuristic in the secondary solver to exit speculation. 

While this is a simple heuristic, we found it to be useful in our benchmarks. 
The best strategy for speculation is problem-dependent. We leave development 
of a robust heuristic for future work. 
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time (s) — sat time (s) — unsat 

# rounds SMS sPECSMS # rounds SMS sPECSMS 
16 0.86 0.94 16 1.09 0.93 

21 — 0.49 21 e: 1.17 

26 = 2.93 26 = 1.95 

31 = 1.33 31 - 2.06 

36 = 1.35 36 = 2.13 


40 = 1.56 40 = 2.64 


Table 1: Solving time with a timeout of 600s. 


6 Implementation and Validation 


We implemented SPECSMS (and SMS) inside the extensible SAT-solver of Z3 [5]*. 
For SMS, we simply disable speculation(Table 1). 

We have validated SPECSMS on a set of handcrafted benchmarks, based 
on Ex. 1. Each benchmark is of the form (win (£, in), ena i (in, out)), where £ is 
a 2-bit vector, in is a 512-bit vector (shared), out is 160-bit vector. Yin encodes 
that there are four possible messages: 


Win = (L = 0 ^ in = msg) V (L = 1^ in = msg,) V 
(£ 2 2^ in = msg4) V (L= 3^ in = msgy) 


and WgHa-1(in, out) encodes the SHA-1 circuit together with some hash: 
Usu £ (SHA-Leirc(in) ^ out = sha Val) 


In the first set of experiments, we check sat queries by generating one msg; in 
Win that produces sha Val. In the second set, we check unsat queries, by ensuring 
that no msg; produces sha Val. To evaluate performance, we make Ysa- harder 
to solve by increasing the number of rounds of SHA-1 circuit encoded in the 
SHA-1,;,, clauses. We used SAT-encoding [13]? to generate the SHA-1eire with 
the different number of rounds (SAT-encoding supports 16 to 40 rounds). 

We use the heuristic described in Sec. 5 to decide when to enter and exit 
speculation. Thus, SPECSMS switches modules when it hits too many conflicts 
in the module. In contrast, SMS only switches to the secondary solver after 
finding a full satisfying assignment in the main solver. 

Results for each set of the queries are shown in Tab. 1. Column “# rounds” 
shows the number of SHA-1 rounds encoded in wWgya-1. The problems quickly 
become too hard for SMS. At the same time, SPECSMS solves all the queries 
quickly. Furthermore, the run-time of SPECSMS appears to grow linearly with 
the number of rounds. 

The experiments validate our claim that switching between modules is quite 
effective in solving the problem. As expected, SMS gets stuck in inverting the 


^ we will provide the repository url after the double-blind review process 
5 Available at https: //github.com/saeednj/SAT- encoding. 
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SHA-1 function. It cannot make progress without using information from the 
secondary module. In contrast, SPECSMS switches to the secondary module 
once it finds that solving SHA-1,;;-(in) is hard. Note that, in this problem, the 
ideal strategy is to speculate eagerly and then branch on all the £ variables. 
However, SPECSMS spend some time solving SHA-1,;,..(in). It only switches to 
the secondary module when it hits many conflicts in SHA-1¢i;¢(in). 


7 Conclusion and Future Work 


Modular SAT-solving is crucial for efficient SAT-based unbounded Model Check- 
ing. Existing techniques, embedded in IC3/PDR [3] and extended in SMS [1], 
trade the efficiency of the solver for the simplicity of conflict resolution. In this 
paper, we propose a new modular SAT-solver, called SPECSMS, that extends 
SMS with truly bi-directional reasoning. We show that it is provably better 
than SMS (and, therefore, IC3/PDR). We implement SPECSMS in Z3 [5], ex- 
tend it with DRUP-style [12] proofs, and proof-based interpolation. This work is 
an avenue to future efficient SAT- and SMT-based Model Checking algorithms. 

In this paper, we rely on a simple heuristic to guide SPECSMS when to 
start speculation and exit speculation. This is sufficient to show the power of 
bi-directional reasoning over uni-directional reasoning on our benchmarks. How- 
ever, other application domains might need more complicated heuristics to make 
this decision. In the future, we plan to explore guiding speculation using similar 
strategy used for guiding restarts in a modern CDCL SAT-solver[2]. 

A much earlier version of speculation, called weak abstraction, is implemented 
in the SPACER Constrained Horn Clause (CHC) solver [10]. Since SPACER ex- 
tends IC3/PDR to SMT, the choice of speculation is based on theory reasoning. 
Speculation starts when the main solver is satisfied modulo some theories (e.g., 
Linear Real Arithmetic or Weak Theory of Arrays). Speculation often prevents 
SPACER from being stuck in any one SMT query. However, SPACER has no inter- 
modular propagation and no refinement. If validation fails, speculation is simply 
disabled and the query is tried again without it. We hope that extending SPEC- 
SMS to theories will make SPACER heuristics much more flexible and effective. 

DPLL(T)-style [7] SMT-solvers can be seen as modular SAT-solvers where 
the main module is a SAT solver and the secondary solver is a theory solver (often 
EUF-solver that is connected to other theory solvers such as a LIA solver). This 
observation credited as an intuition for SMS [1]. In modern SMT-solvers, all 
decisions are made by the SAT-solver. For example, if a LIA solver wants to 
split on a bound of a variable z, it first adds a clause (x € (b—1)Vz > b), where 
b is the desired bound, to the SAT-solver and then lets the SAT-solver branch on 
the clause. SPECSMS extends this interaction by allowing the secondary solver 
(i.e., the theory solver) to branch without going back to the main solver. Control 
is returned to the main solver only if such decisions tangle local decisions of the 
two solvers. We hope that the core ideas of SPECSMS can be lifted to SMT 
and allow more flexibility in the interaction between the DPLL-core and theory 
solvers. 
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Abstract. Satisfiability solving has been used to tackle a range of long- 
standing open math problems in recent years. We add another success by 
solving a geometry problem that originated a century ago. In the 1930s, 
Esther Klein’s exploration of unavoidable shapes in planar point sets in 
general position showed that every set of five points includes four points 
in convex position. For a long time, it was open if an empty hexagon, 
i.e., six points in convex position without a point inside, can be avoided. 
In 2006, Gerken and Nicolás independently proved that the answer is no. 
We establish the exact bound: Every 30-point set in the plane in gen- 
eral position contains an empty hexagon. Our key contributions include 
an effective, compact encoding and a search-space partitioning strategy 
enabling linear-time speedups even when using thousands of cores. 


Keywords: Erdós-Szekeres problem - empty hexagon theorem - planar 
point set - cube-and-conquer - proof of unsatisfiability 


1 Introduction 


In 1932, Esther Klein showed that every set of five points in the plane in general 
position (i.e., no three points on a common line) has a subset of four points in 
convex position. Shortly after, Erdós and Szekeres [8] generalized this result by 
showing that, for every integer k, there exists a smallest integer g(k) such that 
every set of g(k) points in the plane in general position contains a k-gon (i.e., a 
subset of k points that form the vertices of a convex polygon). As the research 
led to the marriage of Szekeres and Klein, Erdós named it the happy ending 
problem. Erdős and Szekeres constructed witnesses of g(k) > 2*~? [9], which 
they conjectured to be maximal. The best upper bound is g(k) < 2*+°™ [20,30]. 
Determining the value g(5) — 9 requires a more involved case distinction 
compared to g(4) — 5 [23]. It took until 2006 to determine that g(6) — 17 
via an exhaustive computer search by Szekeres and Peters [31] using 1500 CPU 
hours. Marié [25] and Scheucher [28] independently verified g(6) = 17 using 
satisfiability (SAT) solving in a few CPU hours. This was later reduced to 10 
CPU minutes [29]. The approach presented in this paper computes it in 8.53 CPU 
seconds, showing the effectiveness of SAT compared to the original method. 
© The Author(s) 2024 
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Fig.1. An illustration for the proof of h(4) — 5: The three possibilities of how five 
points can be placed. Each possibility implies a 4-hole. 


Erdós also asked whether every sufficiently large point set contains a k-hole: 
a k-gon without a point inside. We denote by h(k) the smallest integer—if it 
exists—such that every set of h(k) points in general position in the plane contains 
a k-hole. Both h(3) = 3 and h(4) = 5 are easy to compute (see Fig. 1 for an 
illustration) and coincide with the original setting. Yet the answer can differ a 
lot, as Horton [21] constructed arbitrarily large point sets without 7-holes. 

While Harborth [14] showed in 1978 that h(5) = 10, the existence of 6- 
holes remained open until the late 2000s, when Gerken [12]* and Nicolás [26] 
independently proved that A(6) is finite. Gerken proved that every 9-gon yields 
a 6-hole, thereby showing that h(6) < g(9) < 1717 [33]. The best-known lower 
bound A(6) > 30 is witnessed by a set of 29 points without 6-holes which was 
found by Overmars [27] using a local search approach. 

We close the gap between the upper and lower bound and ultimately answer 
Erdős’ question by proving that every set of 30 points yields a 6-hole. 


Theorem 1. h(6) = 30. 


Our result is actually stronger and shows that the bounds for 6-holes in point sets 

coincide with the bounds for 6-holes in counterclockwise systems [24]. This rep- 

resents another success of solving long-standing open problems in mathematics 

using SAT, similar to results on Schur number five [16] and Keller’s conjecture [4]. 
We also investigate the combination of 6-holes and 7-gons and show 


Theorem 2. Every set of 24 points in the plane in general position contains a 
6-hole or a 7-gon. 


We achieve these results through the following contributions: 


— We develop a compact and effective SAT encoding for k-gon and k-hole 
problems that uses O(n*) clauses, while existing encodings use O(n*) clauses. 

— We construct a partitioning of k-gon and k-hole problems that allows us to 
solve them with linear-time speedups even when using thousands of cores. 

— We present a novel method of validating SAT-solving results that checks the 
proof while solving the problem using substantially less overhead. 

— We verify most of the presented results using clausal proof checking. 


4 Gerken’s groundbreaking work was awarded the Richard-Rado prize by the German 
Mathematical Society in 2008. 
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2 Preliminaries 


The SAT problem. The satisfiability problem (SAT) asks whether a Boolean 
formula can be satisfied by some assignment of truth values to its variables. 
The Handbook of Satisfiability [2] provides an overview. We consider formulas 
in conjunctive normal form (CNF), which is the default input of SAT solvers. 
As such, a formula T is a conjunction (logical “AND”) of clauses. A clause is a 
disjunction (logical “OR”) of literals, where a literal is a Boolean variable or its 
negation. We sometimes write (sets of) clauses using other logical connectives. 

If a formula T is found to be satisfiable, modern SAT solvers commonly 
output a truth assignment of the variables. Additionally, if a formula turns out 
to be unsatisfiable, sequential SAT solvers produce an independently-checkable 
proof that there exists no assignment that satisfies the formula. 


Verification. The most commonly-used proofs for SAT problems are expressed 
in the DRAT clausal proof system [15]. A DRAT proof of unsatisfiability is a 
list of clause addition and clause deletion steps. Formally, a clausal proof is a 
list of pairs (s1, C1), ..., (Sm, Cm), where for each i € {1,...,m}, s; € (a, d] and 
C; is a clause. If s; — a, the pair is called an addition, and if s; — d, it is called 
a deletion. For a given input formula Jo, a clausal proof gives rise to a set of 
accumulated formulas T; (i € {1,...,m}) as follows: 


r= T;—1 U{C;} if s; = a 
ui I; a4N(CI if s; = d 


Each clause addition must preserve satisfiability, which is usually guaranteed 
by requiring the added clauses to fulfill some efficiently decidable syntactic cri- 
terion. Deletions help to speed up proof checking by keeping the accumulated 
formula small. A valid proof of unsatisfiability must add the empty clause. 


Cube And Conquer. The cube-and-conquer approach [18] aims to split a SAT 
instance I" into multiple instances /1,...,15, in such a way that I is satisfiable 
if and only if at least one of the instances T; is satisfiable, thus allowing work 
on the different instances I; in parallel. A cube is a conjunction of literals. Let 
V = (e V -+-+ V Cm) be a disjunction of cubes. When ~ is a tautology, we have 
m T 
r = ray s V(FAa) = Vi, 
i=1 i=1 
where the different T; := (I' ^ c;) are the instances resulting from the split. 
Intuitively, each cube c; represents a case, i.e., an assumption about a sat- 
isfying assignment to I’, and soundness comes from 7 being a tautology, which 
means that the split into cases is exhaustive. If the split is well designed, then 
each I, is a particular case that is substantially easier to solve than I’, and thus 
solving them all in parallel can give significant speed-ups, especially considering 
the sequential nature of CDCL at the core of most solvers. 
However, the quality of the split (v) has an enormous impact on the effec- 
tiveness of the approach. A key challenge is figuring out a high-quality split. 
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Fig. 2. The four ways a point p; can be inside triangle (pa, py, pc) based on whether 
i < b (left two images) and whether p; is above the line pap» (first and third image). 


3 Trusted Encoding 


To obtain an upper-bound result using a SAT-based approach, we need to show 
that every set of n points contains a k-hole. We will do this by constructing 
a formula based on n points that asks whether a k-hole can be avoided. If this 
formula is unsatisfiable, then we obtain the bound h(k) € n. Instead of reasoning 
directly whether an empty k-gon can be avoided, we ask whether every k points 
contain at least one triangle with a point inside. The latter implies the former. 

We only need to know for each triple of points whether it is empty. Through- 
out the paper, we assume that points are sorted with strictly increasing x- 
coordinates. This gives us only four options for a point p; to be inside the triangle 
formed by points pa, py, Pc, see Fig. 2. For example, the left image shows that 
pi is inside if a < i < b, pe and pj are above the line Dapp, and p; is below 
the line Pape. So we need some machinery to express that points are above or 
below certain lines. That is what the encoding will provide. For readability, we 
sometimes identify points by their indices, that is, we refer to p, by its index a. 

We first present what we call the trusted encoding to determine whether a 
6-hole can be avoided. The encoding needs to be trusted in the sense that we 
do not provide a mechanically verified proof of its correctness. Building upon 
existing work [28], our primary focus is on 6-holes, which constitute our main 
result. The encoding of 6-gons and 7-gons is similar and more simple. During an 
initial study, the estimated runtime for showing h(6) < 30 using this encoding 
and off-the-shelf partitioning was roughly 1000 CPU years. The optimizations 
in Sections 4 and 5 reduce the computational costs to about 2 CPU years. 


3.1 Orientation Variables 


We formulate the problem in such a way that all rea- + °’ 
soning is based solely on the relative positions of points. se oh 
Thus, we do not encode coordinates but only orienta- eae 
tions of point triples. For a point set S = (pi,..., Pn} 

with p; = (xi, Yi), the triple (pa, pp, pe.) with a < b «c x 
is positively oriented (resp. negatively oriented) if pe lies d° 
above (resp. below) the line Papo through pa and pp. The 

notion of positive orientation corresponds to Knuth’s Fig.3. An illustration 


counterclockwise relation [24]. Fig. 3 illustrates a posi- of triple orientations. 
tively-oriented triple (Ppa, py, p.) and a negatively-oriented triple (Pa, py, pa). 
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'To search for point sets without k-gons and k-holes, we introduce a Boolean 
orientation variable Oa b,c for each triple (pa, p», pc) with a < b < c. Intuitively, 
a,b,c 15 supposed to be true if the triple is positively oriented. Since we assume 
general position, no three points lie on a common line, so Oa,b,c being false means 
that the triple is negatively oriented. 


3.2 Containment Variables, 3-Hole Variables, and Constraints 


Using orientation variables, we can now express what it means for a triangle to 
be empty. We define containment variables Ci;a b,c to encode whether point pj 
lies inside the triangle spanned by {pa, p», pc}. Since the points have increasing 
rz-coordinates, containment is only possible if a < i < c. We use two kinds of 
definitions, depending on whether i is smaller or larger than b (see Fig. 2). The 
first definition is for the case a < i < b. Note that if op. is true, we only need to 
know whether i is above the line Papo and below the line p;pc. Earlier work [28] 
used an extended definition that included the redundant variable o; b,c. Avoiding 
this variable makes the definition more compact (six instead of eight clauses) 
and the resulting formula is easier to solve. 


Ci;a,b,c = (Cae -7 (Os ib ^ Oa,i,c)) ^ [Oi bs => (Oa,i,b ^ 8:::))) (1) 


The second definition is for b < 4 < c, which avoids using the variable o, y ;: 


10495 


Cisa,b,c 7 (Cae =? (Oaie ^ Ob,i,c)) A (Oav = (Oc i.c ^ ob,i,e))) (2) 


Each definition translates into six clauses (without using Tseitin variables). 
Additionally, we introduce definitions ha,» of 3-hole variables that express 
whether the triangle spanned by {pa, pp, Pe} is a 3-hole. The triangle {pa, Po, Pe} 
forms a 3-hole if and only if no point p; lies in its interior. A point p; can only 
be an inner point if it lies in the vertical strip between p, and pe and if it is 
distinct from pp. Since the points are sorted, the index i of an interior point pj 
must therefore fulfill a < i < c and i Æ b. Logically, the definition is as follows: 


Pate o A Ci;a,b,c- (3) 
a«i«c 
Finally, we encode the “forbid k-hole" constraint as follows: For each subset 
X C S of size k, at least one of the triangles formed by three points in X must 
not be a 3-hole. So for k — 6, each clause consists of ea = 20 literals. 


A CM hz) (4) 


XCS a,b,c€ X 
|X|=k a«b«c 


In Section 4, we will optimize the encoding. Most optimizations aim to im- 
prove the encoding of the constraint (4). 
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Fig. 4. All possibilities to place four points, when points are sorted from left to right. 


3.3 Forbidding Non-Realizable Patterns 


Only a small fraction of all assignments to the (3) orientation variables, 22 (” log ») 


actually describe point sets [3]. However, we can reduce the search space from 
29 (^) to 20m’) by forbidding non-realizable patterns [24]. Consider four points 
Da; Pb, Pc) Pa in a sorted point set with a < b < c < d. The leftmost three 
points determine three lines Dapp, DaPc, PoPc, which partition the open half- 
plane ((z,y) € R? : z > £e} into four regions (see Fig. 4). After placing pg, 
Db, Pc, observe that all realizable positions of point pg obey the following im- 
plications: Ogbe ^ Oa,c,d => Oa,b,d aNd Oa be ^ Obed — Og, cd. Similarly for the 
negations, Og,5,c ^ Oa,c,d — 0a,b,d and Oa,b,c ^ Ob,c,d — Oa,c,a- These implications 
are equivalent to the following clauses (grouping positive and negative): 


? 


(Oad. V Oa,c,d V Oa.b,d) ^ (Oa. b.c V Oa c,d V Oaoa) (5) 


(Oo. b.c V Ob,c,d V Oa. c, d) ^ (Oa. b.c V Ob c,d V Oa.c.d) (6) 


Forbidding these non-realizable assignments was also used for g(6) < 17 [31]. 
Some call the restriction signotope axioms [10]. The counterclockwise system 
axioms [24] achieve the same effect, but require O(n?) clauses instead of O(n*). 


3.4 Initial Symmetry Breaking 


To further reduce the search space, we ensure that p; lies on the boundary of the 
convex hull (i.e., it is an extremal point) and that p», ...,p,, appear around p; in 
counterclockwise order, thus providing us the unit clauses (01,a,b) for 1 < a < b. 
Without loss of generality, we can label points to satisfy the above, because the 
labeling doesn't affect gons and holes. However, we also want points to be sorted 
from left to right. One can satisfy both orderings at the same time using the 
lemma below. We attach a proof in the extended version [19]. 


Lemma 1 ([28, Lemma 1]). Let S = (pi,..., Pn} be a point set in the plane 


in general position such that pı is extremal and p2,...,Pn appear (clockwise or 
counterclockwise) around pı. Then there exists a point set S = (pi,..., Pn} with 
the same triple orientations (in particular, pı is extremal and fo,..., p, appear 


around pi) such that the points D,..., p, have increasing x-coordinates. 


Happy Ending: An Empty Hexagon in Every Set of 30 Points 67 
4 Optimizing the Encoding 


An ideal SAT encoding has the following three properties: 


1) it is compact to reduce the cost of unit propagation (and cache misses); 
2) it detects conflicts as early as possible (i.e., is domain consistent [11]); and 
3) it contains variables that can generalize conflicts effectively. 


The trusted encoding lacks these properties because it has O(n9) clauses, 
cannot quickly detect holes, and has no variables that can generalize conflicts. 
In this section, we show how to modify the trusted encoding to obtain all three 
properties. All the modifications are expressible in a proof to ensure correctness. 


4.1 Toward Domain Consistency 


The effectiveness of an encoding depends on how quickly the solver can determine 
a conflict. Given an assignment, we want to derive as much as possible via unit 
propagation. This is known as domain consistency [11]. The trusted encoding 
does not have this property. We modify the encoding below to boost propagation. 

We borrow from the method by Szekeres and Peters that a k-gon can be de- 
tected by looking at assignments to k — 2 orientation variables [31]. For example, 
if 0 ,5,c, Ob,c,d; Oc,d,e; ANd Ode, f with a<b<c<d<e<f are assigned to the same 
truth value, then this implies that the points form a 6-gon. An illustration of 
this assignment is shown in Fig. 5 (left). We combine this with our observation 
below that only a specific triangle has to be empty to infer a 6-hole somewhere. 

Consider a scenario involving six points, a, b, c, d, e, and f, that are arranged 
from left to right. In this scenario, the orientation variables o, b,c; Op cq, Oc,q,e; 
and o4,.,; are all set to false, while the 3-hole variable ha c,e is set to true. As 
mentioned above, this implies that the points form a 6-gon. Together with 3- 
hole variable ha,c,e being set to true, we can deduce the existence of a 6-hole: The 
6-gon is either a 6-hole or it contains a 6-hole. The reasoning will be explained 
in the next paragraph. Note that in the trusted encoding of this scenario, only 
one out of the twenty literals in the corresponding ‘forbid 6-hole’ clause is false. 
'This suggests that the solver is still quite far from detecting a conflict. 

A crucial insight underpinning our efficient encoding is the understanding 
that the truth of the variable ha,c,e alone is sufficient to infer the existence of 
a 6-hole. Consider the following rationale: If the triangle {a,b,c} contains any 
points, then there must be at least one point inside the triangle that is closer to 
the line ac than point b is. Let's denote the nearest point as i. The proximity of 
i to the line ac guarantees that the triangle (a, i, c] is empty. We can substitute 
b with i to create a smaller but similarly shaped hexagon. This logic extends to 
other triangles as well; specifically, the truth values of h; 4, and ha,e,f are not 
necessary to infer the presence of a 6-hole. 

Our insight emerged when we noticed that the SAT solver eliminated 3-hole 
literals from previous encodings. This elimination occurred primarily when only 
a few points existed between the leftmost and rightmost points of a triangle. On 
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Fig.5. Three types of 6-gons: left, all points are on one side of line af (2 cases); 
middle, three points are on one side and one point is on the other side of line af 


(8 cases); and right, two points are on either side of line af (6 cases). If the marked 
triangle is empty, we can conclude that there exists a 6-hole. 


the other hand, the solver struggles significantly to identify the redundancy of 
these 3-hole literals when the leftmost and rightmost points of a triangle were 
far apart. Therefore, to enhance the encoding’s effectiveness, we chose to omit 
these 3-hole literals (instead of letting the solver figure it out). 

Blocking the existence of a 6-hole within the 6-gon described above can be 
achieved with the following clause (which simply negates the assignment): 


Oa,b,c V Ob,c,d V Oc,d,e V Od,e,f V hace (7) 


For each set of six points, 16 different configurations can result in a 6-hole. 
These configurations depend on which points are positioned left or right the line 
connecting the leftmost and rightmost points among the six. The three types of 
such configurations are illustrated in Fig. 5, while the remaining configurations 
are symmetrical to these. It is important to note that this adds 16 x (5) clauses 
to the formula, significantly increasing its size. 

We can reduce the number of clauses by about a 30% by strategically selecting 
which triangle within a 6-gon is checked to be empty (i.e., which 3-hole literal 
will be used). The two options are the triangle that includes the leftmost point 
(as depicted in Fig. 5) and the triangle with the second-leftmost point. If the 
leftmost point is p1, we opt for the second-leftmost point; otherwise, we choose 
the leftmost point. After propagating the unit clauses 01,4, the clauses that 
describe configurations with three points below the line af become subsumed 
by the clause for the configuration with four points below the line 1f. 


4.2 An O(n?) Encoding 


This section is rather technical. It introduces auxiliary variables to reduce our 
encoding to O(n*) clauses. The process is known as structured bounded vari- 
able addition (SBVA) [13], which in each step adds a new auxiliary variable to 
encode a subset of the formula more compactly. SBVA heuristically selects the 
auxiliary variables. Instead, we select them manually because it is more effective, 
the new variables have meaning, and SBVA is extremely slow on this problem. 
Eliminating the auxiliary variables results in the encoding of Section 4.1. 
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'The first type of these variables, ud cao Tepresents the presence of a 4-gon 
{a,b,c,d} such that points a,b,c,d appear in this order from left to right and 
b and c are above the line ad. Furthermore, the variables UP as indicate the 
existence of a 5-gon {a,b,c,d,e} with the property that the points a, b,c, d,e 
appear in this order from left to right, the points b, c, and d are above the line 
ae, and the triangle (a, c, e) is empty. This configuration implies the existence of 
a 5-hole within (a,b, c, d, e] using similar reasoning as described in Section 4.1. 
'The clauses enforcing these properties are outlined below. 


c,d V a,b,c V Ob,c,d with a «b«c«d (8) 
Vul uU come witha<c<d<e (9) 


UP da 


In the following we distinguish five types of 6-holes by the number of its 
points that lie above/below the line connecting its leftmost and rightmost points. 
Fig. 5 shows the three configurations with four, three, and two points above the 
line, respectively. The two cases with three and four points below the line are 
symmetric but will be handled in a different and more efficient manner below. 
To block all 6-holes with configurations having three or four points above the 
line connecting the leftmost and rightmost points, we utilize the variables u? de 
Specifically, a configuration with three points above occurs if there is a point b 
situated between a and e, lying below the line ac. Also, the configuration with 
four points above arises when a point f, located to the right of e, falls below 
the line de. The associated clauses for these configurations are detailed below. 
'The omission of 3-hole literals is justified by our knowledge that a 3-hole exists 
among a, c, and e for some point c positioned above the line ae. 


UP d.e V Oa,b,e witha<d<e,a<b<e (10) 
"am witha<d<e<f (11) 
To block the third type of a 6-hole, we need to introduce variables ee 


which, similar as uź „4, indicate the presence of a 4-gon {a,b,c,d} with the 
property that the points a,b,c,d appear in this order from left to right and b 


and c are below the line ad. The clauses that encode these variables are: 
V ea V Dabe V Doca with a «b «c«d (12) 


Using the variables uf eq and Va. cq We are now ready to block the configu- 
ration of the third type of a 6-hole where two points lie above and two points lie 
below the Hn connecting the leftmost and rightmost points; see Fig. 5 (right). 
Recall that uL cq denotes a 4- a situated above the line ad, with c being the 
second-rightmost point. Also, Vg 4 denotes a 4-gon riui the line ad, with c 
as the second-rightmost point. A 6-hole exists if both u4 cq and Va. cq are true 
for some pu a and d when there are no points within the triangle formed by 
a, c, and c'. Or, in clauses: 


UF ng V vi Vus V Pace with a «c«c' «d (13) 


Us cd 


Te Vi ad V ache with a « c <c<d (14) 
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The remaining configurations to consider involve those with three or four 
points below the line joining the leftmost and rightmost points. As we discussed 
at the end of Section 4.1, these configurations can be encoded more compactly. 
We only need to block the existence of 5-holes {a,b,c,d,e} with the property 
that the points 1, a,b,c, d, e appear in this order from left to right and the points 
b, c, and d are below the line ae. The reasoning is as follows: if such a 5-hole 
exists, it can be expanded into a 6-hole by the closest point to line ab within 
the triangle (1, a, b} (which is point 1 if the triangle is empty). Additionally, by 
blocking these specific 5-holes, we simultaneously block all 6-holes with three or 
four points below the line between the leftmost and rightmost points. Following 
the earlier cases, we only require a single 3-hole literal which ensures that the 
triangle (a, c, e) is empty. The clauses to block these 5-holes are as follows: 


Ve sd V Odys V hase with l<a<c<d<e (15) 


This encoding uses O(n*) clauses, while it has the same propagation power as 
having all the 16x eal clauses in the domain-consistent encoding of Section 4.1. In 
general, the trusted encoding for k-holes uses O(n) clauses, while the optimized 
encoding when generalized to k-holes has only O(kn*) clauses, or O(n*) for every 
fixed k. An encoding of size O(n*) for k-gons is analogous: simply remove the 
3-hole literals from the clauses. 


4.3 Minor Optimizations 


We can make the encoding even more compact by removing a large fraction of 
the clauses from the trusted encoding. Note that constraints to forbid 6-holes 
contain only negative 3-hole literals. That means that only half of the constraints 
to define the 3-hole variables are actually required. This in turn shows that only 
half of the inside variable definitions are required. So, instead of (1), (2), and (3), 
it suffices to use the following: 


Ci;a,b,c — (Cae = (Oa ib ^ Oa,i,c)) ^ (a,b,c => (Oa,i,b ^ S552) ) (16) 


Ci;a,b,c — (R =} (0s. i.c ^ OLic)) ^ (o Ex (Os i.c ^ ov,i)) ) (17) 


habe Å ene (18) 
a<i<c 
ib 

It is worth noting that the SAT preprocessing technique blocked-clause elim- 
ination (BCE) can automatically remove the omitted clauses [22]. However, for 
means of efficiency, BCE is turned off by default in top-tier solvers, including 
the solver CaDiCaL, which we used for the proof. During initial experiments, we 

observed that omitting these clauses slightly improves the performance. 
Finally, the variables u4 eq and vi c.q can be used to more compactly encode 


the clauses (6). We can replace them with the following clauses: 


(te ea V Cama OW oa V Onna) witha<c<d (19) 


a,c,d 
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4.4 Breaking the Reflection Symmetry 


Holes are invariant to reflectional symmetry: If we mirror a point set S, then 
the counterclockwise order around the extremal point pı (which is po,..., pn) 
is reversed (to p, ..., pa). By relabeling points to preserve the counterclockwise 
order, we preserve 01,4,) = true for a < b, while the original orientation variables 
Oa,b,c With 2 € a < b < c € n are mapped to 05 c42,5-642,n-a42. A similar 
mapping applies to the containment and 3-hole variables. The trusted encoding 
maps almost onto itself, except for the missing reflection clauses of (5) and (6). 
As a fix for verification, we add each reflected clause using one resolution step. 

Since only a tiny fraction of triple orientations map to themselves (so-called 
involutions), breaking the reflectional symmetry reduces the search space by a 
factor of almost 2. We partially break this symmetry by constraining the vari- 
ables 05,541,442 with 2 < a € n — 2. We used the symmetry-breaking predicate 
below, because it is compatible with our cube generation, described in Section 5. 


Or21-L[2], [2140-5 923,4 $ 9|2 |, [2 22, 2] H3 5 On—2,n-1,n (20) 


One symmetry that remains is the choice of the first point. Any point on 
the convex hull could be picked for this purpose, and breaking it can potentially 
reduce the search space by at least a factor of 3. However, breaking this symmetry 
effectively is complicated and we therefore left it on the table. 


5 Problem Partitioning 


The formula to determine that h(6) < 30 requires CPU years to solve. To com- 
pute this in reasonable time, the problem needs to be partitioned into many 
small subproblems that can be solved in parallel. Although there exist tools to 
do the partitioning automatically [18], we observed that this partitioning was 
ineffective. As a consequence, we focused on manual partitioning. 

During our initial experiments, we determined which orientation variables 
were suitable for splitting. We used the formula for g(6) < 17 for this purpose 
because its runtime is large enough to make meaningful observations and small 
enough to explore many options. It turned out that the variables 09,441,042 
were the most effective choice for splitting the problem. Assigning one of these 
Oa,a+1,a+2 Variables to true/false roughly halves the search space and reduces 
the runtime by a factor of roughly 2. 

A problem with n points has n — 3 free variables of the form 0¢,4+1,a+2, as the 
variable 01.2.3 is already fixed by the symmetry breaking. One cannot generate 
2"—? equally easy subproblems, because (04,441,412 VOa+1,a+2,a43 V0a+2,a+3,a+4) 
and (Oa,a t1,a+2 V Oa -1,a4-2,a4-3 V Oa+2,a+3,a+4 V Oa+3,a+4,a+5) follow directly from 
the optimized formula after unit propagation. Thus, assigning three consecutive 
Oa, a+1,a+2 Variables to true results directly in a falsified clause, as it would create 
a 6-hole among the points pi, Pa, ..., Pata. The same holds for four consecutive 
Oa,a+1,a+2 Variables assigned to false, which would create a 6-hole among the 
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points Pa, ..., Pats. The asymmetry is due to fixing the variables 0;,,,, to true. 
If we assigned them to false, then the opposite would happen. 

We observed that limiting the partition to variables involving the middle 
points reduces the total runtime. We will demonstrate such experiments in Sec- 
tion 6.2. So, to obtain suitable cubes, we considered all assignments of the se- 
UENCE 04,441,042) Oa+1,a+2,a+3; +++ Ca4+l—1,a+0,a+e+1 for a suitable constant £ 
and a = nit — 1 such that the above properties are fulfilled, that is, no three con- 
secutive entries are true and no four consecutive entries are false. In the following 
we refer to £ as the length of the cube-space. In our experiments, we observed 
that picking £ < n — 3 reduces the overall computational costs. Specifically, for 
the h(6) < 30 experiments, we use length £ = 21. 

Our initial experiments showed that the runtime of cubes grows exponen- 
tially with the number of occurrences of the alternating pattern Ob b+1,b+2 = +, 
ObL1,bL2,bL3 = —, Obi2,bL3,b44 = +. AS a consequence, the hardest cube for 
h(6) < 30 would still require days of computing time, thereby limiting par- 
allelism. To deal with this issue, we further partition cubes that contain this 
pattern. For each occurrence of the alternating pattern in a cube, we split the 
cube into two cubes: one that extends it with op 542,544 and one that extends it 
with 05,542,544. Note that we do this for each occurrence. So a cube containing 
m of these patterns is split into 2™ cubes. This reduced the computational costs 
of the hardest cubes to less than an hour. 


6 Evaluation 


For the experiments, we use the solver CaDiCaL (version 1.9.3) [1], which is cur- 
rently the only top-tier solver that can produce LRAT proofs directly. The effi- 
cient, verified checker cakeLPR [32] validated the proofs. We run CaDiCaL with 
command-line options: --sat --reducetarget-10 --forcephase --phase=0. 
The first option reduces the number of restarts. This is typically more useful 
for satisfiable formulas (as the name suggests), but in this case it is also help- 
ful for unsatisfiable formulas. The second option turns off the aggressive clause 
deletion strategy. The last two options turn on negative branching, a MiniSAT 
heuristic [7]. Experiments were run on a specialized, internal Amazon Web Ser- 
vices solver framework that provides cloud-level scaling. The framework used 
m6i.xlarge instances, which have two physical cores and 16 GB of memory. 


6.1 Impact of the Encoding 


To illustrate the impact of the encoding on the performance, we show some statis- 
tics on various encodings of the h(6) < 30 formula. We restricted this experiment 
to solving a single randomly-picked subproblem. For other subproblems, the re- 
sults were similar. We experimented with the following five encodings: 


— T: the trusted encoding presented in Section 3 
— Oy: T with (4) replaced by the domain-consistent encoding (7) of Section 4.1 
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— Og: O, with (7) replaced by the O(n*) encoding (8) - (15) of Section 4.2 

— Os: Os with the minor optimizations that replace (1), (2), (3), and (6) by 
(17), (18), (18), and (19), respectively, see Section 4.3 

— O4: Os extended with the symmetry-breaking predicate from Section 4.4 


Table 1 summarizes the results. The domain-consistent encoding can be 
solved more efficiently than the trusted encoding while having over five times 
as many clauses. The reason for the faster performance becomes clear when 
looking at the number of conflicts and propagations. The domain-consistent en- 
coding requires just over a fifth as many conflicts and propagations to determine 
unsatisfiability. The auxiliary variables that enable the O(n*) encoding reduce 
the size by almost an order of magnitude. The resulting formula can be solved 
three times as fast, while using a similar number of conflicts and propagations. 
'The minor optimizations reduce the size by roughly a third and further improve 
the runtime. Finally, the addition of the symmetry-breaking predicate doesn't 
impact the performance. Its main purpose is to halve the number of cubes. 

We also solved the optimized encoding (O3) of the formula g(6) < 17, which 
takes 41.99 seconds using 623 540 conflicts. Adding the symmetry-breaking pred- 
icate (O4) reduces the runtime to 17.39 seconds using 316 785 conflicts. So the 
symmetry-breaking predicate reduces the number of conflicts by roughly a fac- 
tor of 2 (as expected) while the runtime is reduced even more. The latter is due 
to the slowdown caused by maintaining more conflict clauses while solving the 
formula without the symmetry-breaking predicate. 


6.2 Impact of the Partitioning 


All known point sets witnessing the lower bound A(6) > 30 contain a 7-gon. 
To obtain a possibly easier problem to test and compare heuristics, we studied 
how many points are required to guarantee the existence of a 6-hole or a 7- 
gon. It turned out that the answer is at most 24 (Theorem 2). Computing this 
is still hard but substantially easier compared to our main result. During our 
experiments, we observed that increasing the number of cubes can increase the 
total runtime. We therefore explored which parameters produce the lowest total 
runtime. The experimental results are shown in Table 2 for various values for 
the parameter £. Incrementing £ by 2 increases the number of cubes roughly by 
a factor of 3. The optimal total runtime is achieved for / = 15, which is a 62% 


Table 1. Comparison of the different encodings. 


formula #variables clauses #conflicts #propagations time (s) 


T 62930 1171942 1082569 1338662627 243.07 
Oi 62930 | 5823078 228 838 282774472 136.20 
O» 75 110 667 005 211 272 343 388 591 45.49 
Oa 75110 436 047 234 755 340 387 692 39.46 


O4 15110 444 238 234 587 342 904 580 39.41 
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Fig. 6. Runtime to solve the subproblems of Theorem 2 for various splitting parameters. 


reduction compared to full partitioning (¢ = 21). Note that the solving time 
for the hardest cube (the max column) increases substantially when using fewer 
cubes. This in turn reduces the effectiveness of parallelism. The runtime without 
partitioning is expected to be about 1000 CPU hours, so partitioning achieves 
super-linear speedups and more than a factor of 4 speedup for £ = 15. Fig. 6 
shows plots of cumulatively solved cubes, with similar curves for all settings. 

We also evaluated the off-the-shelf tool March for partitioning. This tool 
was used to prove Schur Number Five [16]. We used option -d 13 to cut off 
partitioning at depth 13 to create 8192 cubes. That partition turned out to be 
very poor: at least 18 cubes took over 100000 seconds. The expected total costs 
are about 10000 CPU hours, so 10 times the estimated partition-free runtime. 

A partitioning can also guide the search to solve the formula g(6) < 17. The 
partitioning of this formula using £ = 12 results in 1108 cubes. If we add these 
cubes to the formula with the symmetry-predicate (O4) in the iCNF format [34], 
then CaDiCaL can solve it in 8.53 seconds using 205 153 conflicts. 


Table 2. Runtime comparison for different values of partitioning parameter £ 


£ ##cubes average time (s) max time (s) total time (h) 


21 312418 6.99 66.86 606.55 
19 89384 13.61 123.70 337.96 
17 25663 34.29 293.10 244.50 
15 7393 112.61 949.50 231.27 
13 2149 431.26 3347.59 257.44 
11 629 1847.46 11844.05 322.79 

9 188 7745.14 32 329.05 404.47 


T 57 32 905.90 105 937.76 521.01 
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Fig. 7. Reported process time to solve the subproblems of h(6) € 30 with proof logging 
while running a formally-verified checker to validate the solver's output. 


6.3 Theorem 1 


To show that the optimized encoding for h(6) < 30 is unsatisfiable, we par- 
titioned the problem with the splitting algorithm described in Section 5 with 
parameter / = 21, which results in 312418 cubes. We picked this setting based 
on the experiments shown in Table 2. Fig. 7 shows the runtime of solving the 
subproblems. The average runtime was just below 200 seconds. All subproblems 
were solved in less than an hour. Almost 24000 subproblems could be solved 
within a second. For these subproblems, the cube resulted directly in a conflict, 
so the solver didn't have to do any search. 

'The total runtime is close to 17 300 CPU hours, or slightly less than 2 CPU 
years. We could achieve practically a linear speedup using 1000 m6i.xlarge 
instances. The timings include producing and validating the proof as described in 
Section 7.1. T'he combined size of the proofs is 180 terabytes in the uncompressed 
LRAT format used by the cakeLPR checker. In past verification efforts of hard 
math problems, the produced proofs were in the DRAT format. For this problem, 
the LRAT proofs are roughly 2.3 times as large as the corresponding DRAT 
proof. We estimate that the DRAT proof would have been 78 terabytes in size, 
so approximately one third to the proof of the Pythagorean triples problem [17]. 
For all problems, the checker was able to easily catch up with the solver while 
running on a different core, thereby finishing as soon as the solver was done. 


7 Verification 


We applied three verification steps to increase trust in the correctness of our 
results. In the first step, we check the results produced by the SAT solver. The 
second step consists of checking the correctness of the optimizations discussed 
in Section 4. In the third step, we validate that the case split covers all cases. 
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7.1 Concurrent Solving and Checking 


The most commonly used approach to validate SAT-solving results works as 
follows. First, a SAT solver produces a DRAT proof. This proof is checked and 
trimmed using an unverified efficient tool that produces a LRAT proof. The 
difference between a DRAT proof and a LRAT proof is that the latter contains 
hints. The LRAT proof is then validated by a formally-verified checker, which 
uses the hints to obtain efficient performance. 

Recently, the SAT solver CaDiCaL added support for producing LRAT proofs 
directly (since version 1.7.0). This allows us to produce the proof and validate 
it concurrently. To the best of our knowledge, we are the first to take advantage 
of this possibility. CaDiCaL sends its proof to a pipe and the verified checker 
cakeLPR reads it from the pipe. This tool chain works remarkably well and adds 
little overhead while avoiding storing large files. 


7.2  Reencoding Proof 


We validated the four optimizations presented in Section 4. Only the trusted 
encoding has the reflection symmetry, as each of the optimizations don’t preserve 
this symmetry. Each of the clauses in the symmetry-breaking predicate have the 
substitution redundancy (SR) property [5] with respect to the trusted encoding. 
However, there doesn’t exist a SR checker. Instead, we transformed the SR check 
into a sequence of DRAT addition and deletion steps. This is feasible for small 
point sets (up to 10 points), but is too expensive for the full problem. It may 
therefore be more practical to verify this optimization in a theorem prover. 
Transforming the trusted encoding into the domain-consistent one is challeng- 
ing to validate because the solver cannot easily infer the existence of a 6-hole 
using only the clauses (7). Since we are replacing (4) by (7) and clause deletion 
trivially preserves satisfiability, we only need to check whether each of the clauses 
(7) is entailed by the trusted encoding. This can be achieved by constructing a 
formula that asks whether there exists an assignment that satisfies the trusted 
encoding, but falsifies at least one of the clauses (7). We validated that this 
formula is unsatisfiable for n < 12 (around 300 seconds).? The formula becomes 
challenging to solve for larger n. However, the validation for small n provides 
substantial evidence of the correctness of the encoding and the implementation. 
Checking the correctness of the other two optimizations is easier. Observe 
that one can obtain the domain-consistent encoding from the O(n*) encoding 
by applying Davis-Putnam resolution [6] on the auxiliary variables. This can be 
expressed using DRAT steps. The DRAT derivation from the domain-consistent 
encoding to the O(n*) encoding applies all these steps in reverse order. The 
minor optimizations mostly delete clauses, which is trivially correct for proofs 
of unsatisfiability. The clauses (19) have the RAT property on the auxiliary 
variables and their redundancy can easily be checked using a DRAT checker. 


? We implemented an entailment tool, see https:/ /github.com/marijnheule/entailment 
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7.3 Tautology Proof 


The final validation step consists of checking whether the partition of the problem 
covers the entire search space. This part has also been called the tautology 
proof [16], because in most cases it needs to determine whether the disjunction 
of cubes is a tautology. We take a slightly different approach and validate that 
the following formula is unsatisfiable: the conjunction of the negated cubes; the 
symmetry-breaking predicate; and some clauses from the formula. 

Recall that we omitted various cubes because they resulted in a conflict with 
the clauses (05,511,442 V Oa+1,a+2,a+3 V Oa+2,a+3,a14) With a € {2,...,n—4} and 
(Oa a+1,a+2 V Oa+1,a+2,a+3 V Oa+2,a+3,a+4 V Oa+3,a+4,a+5) with a € {2, ea 8 — 5}. 
We checked with DRATtrim that these clauses are implied by the optimized 
formulas, which takes 0.3 CPU seconds. We combined them with the negated 
cubes and the symmetry-breaking predicate, which results in an unsatisfiable 
formula that can be solved by CaDiCaL in 12 CPU seconds. 


8 Conclusion 


We closed the final case regarding k-holes in the plane by showing h(6) = 30. 
This is another example that SAT-solving techniques can effectively solve a range 
of long-standing open problems in mathematics. Other successes include the 
Pythagorean triples problem [17], Schur number five [16], and Keller’s conjec- 
ture [4]. Also, we recomputed g(6) = 17 many orders of magnitude faster com- 
pared to the original computation by Szekeres and Peters [31] even when taking 
into account the difference in hardware. So, SAT techniques overwhelmingly out- 
performed a dedicated approach on this geometry problem. Key contributions in- 
clude an effective, compact encoding and a partitioning strategy enabling linear- 
time speedups even when using thousands of cores. We also presented a new 
concurrent proof-checking procedure to significantly decrease validation costs. 

Although the tools are fully automatic, some aspects of our solution require 
the ingenuity of the user. In particular, we had to develop encoding optimizations 
and a search-space partitioning strategy to take full advantage of the power of 
the tools. Constructing the domain-consistent encoding automatically appears 
challenging. Most other optimizations can be achieved automatically, for example 
via structured bounded variable elimination [13]. However, the resulting formula 
cannot be solved as efficiently as the presented one. Substantial research into 
generating effective partitionings is required to enable non-experts to solve such 
hard problems. Although we validated most steps, formally verifying the trusted 
encoding or even the domain-consistent encoding would further increase trust in 
the correctness of our result. 
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Abstract. Generalized Reactivity(1) (GR(1)) synthesis is a reactive 
synthesis approach in which the specification is split into two parts: a 
symbolic game graph, describing the safe transitions of a system, a live- 
ness specification in a subset of Linear Temporal Logic (LTL) on top of it. 
Many specifications can naturally be written in this restricted form, and 
the restriction gives rise to a scalable synthesis procedure — the reasons 
for the high popularity of the approach. For specifications even slightly 
beyond GR(1), however, the approach is inapplicable. This necessitates a 
transition to synthesizers for full LTL specifications, introducing a huge 
efficiency drop. This paper proposes a synthesis approach that smoothly 
bridges the efficiency gap from GR(1) to LTL by unifying synthesis for 
both classes of specifications. The approach leverages a recently intro- 
duced canonical representation of omega-regular languages based on a 
chain of good-for-games co-Büchi automata (COCOA). By constructing 
COCOA for the liveness part of a specification, we can then build a 
fixpoint formula that can be efficiently evaluated on the symbolic game 
graph. The COCOA-based synthesis approach outperforms standard ap- 
proaches and retains the efficiency of GR(1) synthesis for specifications 
in GR(1) form and those with few non-GR(1) specification parts. 


1 Introduction 


Reactive synthesis is the process of automatically computing a provably correct 
reactive system from its formal specification [13]. A safety-critical system is 
often developed twice: first, when it is described using a formal specification, 
and second, when a system is implemented according to this specification. The 
dream of reactive synthesis is to fully eliminate manual implementation phase. 

Reactive synthesis is however computationally hard. For specifications in the 
commonly used linear temporal logic (LTL), checking whether an implementa- 
tion exists is 2EXPTIME-complete [30]. The classical approach to solve reactive 
synthesis from LTL is to first translate the LTL formula into a deterministic 
parity automaton, followed by solving the induced two-player parity game [7]. 
The system player wins this game if and only if there is an implementation 
satisfying the specification. It is the first phase of translating LTL to parity au- 
tomaton that usually represents a bottleneck. T'his observation spurred a series 
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of synthesis approaches. For instance, in bounded synthesis, either the maximal 
number of states that a system can have [22] or the longest system response 
time [20] is restricted. If there exists a system realizing the specification, then 
there exists one that adheres to some bounds, and bounded synthesis works well 
whenever small bounds suffice for realizing the given specification. Another ap- 
proach is to synthesize implementations for parts of the specification, and to then 
compose them into one that realizes the whole specification [25,31,21]. The ap- 
proach of [26] avoids constructing one large deterministic parity automaton and 
instead constructs many smaller ones that—when composed together—represent 
the original specification. Such decomposition proved beneficial on practical ex- 
amples [1]. Finally, there are approaches that consider “synthesis-friendly” sub- 
sets of LTL. Alur and La Torre identified a number of such LTL fragments with 
a simpler synthesis problem [3], and this eventually led to the introduction of 
Generalized Reactivity(1) synthesis by Piterman et al. [28], GR(1) for short. 
GR(1) synthesis gained a lot of prominence and was applied in domains such 
as robotics [34,24], cyber-physical system control [36,35], and chip component 
design [8,23]. We describe it in more detail. 


In GR(1) synthesis, the specification is divided into two parts. The first part 
represents the safety properties of a system and encodes a symbolic game graph. 
Each graph vertex encodes a valuation of last system inputs and outputs. The 
transitions in the graph represent how these variables can evolve in one step. 
For instance, a robot on a grid can move from its current cell to the left, right, 
up, or down, but cannot jump; this is easily encoded as a symbolic game graph. 
Secondly, there are liveness properties of the following form: if certain vertices 
are visited infinitely often, then certain other vertices must be visited infinitely 
often as well. The liveness properties are encoded symbolically using LTL formu- 
las of the shape A; GFy; > A j GFj, where o; and 7); are Boolean formulas over 
input and output system propositions. Synthesis problems from many domains 
can be encoded naturally, or after some manual effort, into the GR(1) setting. 


Constraining specifications to GR(1) form reduces the synthesis problem's 
complexity from doubly-exponential to singly-exponential (in the number of 
propositions), or polynomial when the number of propositions is fixed [8]. The 
GR(1) synthesis problem can be solved by evaluating a fixpoint formula on the 
symbolic game arena. The fixpoint formula defines the set of vertices from which 
the system player satisfies the GR(1) liveness properties while staying in the 
game arena. The simple shape of GR(1) liveness properties makes the fixpoint 
formula simple. Moreover, evaluating the fixpoint formula on the symbolic game 
graph can be done efficiently using Binary Decision Diagrams (BDDs, [12]) as the 
underlying data structure. These factors together — efficient implementation and 
relatively expressive specification language — made GR(1) synthesis popular. 

GR(1) synthesis has a drawback. A single property outside of GR(1) — for in- 
stance, "eventually the robot always stays in some stable zone" (FG inStableZone) 
- makes GR(1) synthesis inapplicable. Switching to full-LTL synthesizers intro- 
duces an abrupt efficiency drop, as they do not take advantage of the simple 
structure of GR(1)-like specifications. For improving the practical applicability 
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of reactive synthesis, a synthesis approach exhibiting a smooth efficiency curve 
on the way from GR(1) to LTL would hence be useful. While there are are 
some GR(1) synthesis extensions (e.g., [4,17]), they only extend it by certain 
specification classes and consequently do not support full LTL. 

This paper unifies synthesis for GR(1) and full LTL. Like in GR(1) synthesis, 
we aim at synthesis for specifications split into the safety part encoded as a 
symbolic game graph and the liveness part. Unlike the standard GR(1) synthesis, 
the liveness part can be any LTL or omega-regular property. For standard GR(1) 
specifications, our approach inherits the efficiency of GR(1) synthesis, including 
when a specification does not fall syntactically into this class, but is semantically 
a GR(1) specification. At the same time, for specifications that go beyond GR(1) 
and only have a few non-GR(1) components, our approach scales well. 

Our solution is based on the same fixpoint-evaluation-of-symbolic-game-graph 
idea. Our starting point is a folklore approach based on solving parity games by 
evaluating fixpoint equations [11]. We modify it so that it becomes applicable to 
specifications given in the form of a chain of good-for-games co-Büchi automata 
(COCOA). Such chains have recently been proposed as a new canonical repre- 
sentation of omega-regular languages [19], and it has been shown how minimal 
and canonical COCOA can be computed in polynomial time from a deterministic 
parity automaton of the language. Our COCOA-based synthesis approach con- 
verts the liveness part of the specification into a parity automaton, constructs 
the chain, builds the fixpoint formula from the chain, and finally evaluates it 
on the symbolic game graph. We show that the fixpoint formula built from the 
chain has a structure similar to GR(1) fixpoint formulas. This is not the case 
for the folklore approach via parity games, and as a result, our COCOA-based 
synthesizer is roughly an order of magnitude faster. The COCOA-based synthe- 
sis approach inherits the efficiency of GR(1) synthesis, and it is also efficient on 
specifications slightly beyond GR(1). Finally, our approach is the first applica- 
tion of the new canonical representation of omega-regular languages. 


2 Preliminaries 


Automata and languages 


Let N = {0,1,2,...} be the set of natural numbers including 0. Let AP be a 
set of atomic propositions; 2^" denotes the valuations of these propositions. A 
Boolean formula represents a set of valuations: for instance, à ^ b, also written 
ab, encodes valuations in which proposition a has value false and b is true. A 
Boolean function maps valuations of propositions to either true or false. Binary 
decision diagrams (BDDs) are a data structure for manipulating such functions. 
A word is a sequence of proposition valuations w = zozi... € (24°)“U(24P)*. 
A word can be finite or infinite. A language is a set of infinite words. Given a 
language L, the suffix language of L for some finite word p € (2^P)* is £(L,p) = 
{xor1... € (2^P)" | p- zozi... € L}. The words in this set are called suffix 
words. The set of all suffix languages of L is the set {L(L,p) | p € (2^)*). 


86 R. Ehlers and A. Khalimov 


Automata over infinite words are used to finitely represent languages. We 
consider parity and co-Büchi automata with transition-based acceptance. A par- 
ity automaton is a tuple A = (X, Q,qo,ô) with a finite alphabet X (usually 
X = 2^), a finite set of states Q, an initial state qo € Q, and a finite transition 
relation ô C Q x X x Q x N satisfying (q,z,q',c) € 6 = (q,v,q',c) Z ô for all 
q,v,q and c' Æ c. An automaton is complete if for every state q and letter x 
there exists at least one pair (q',c) € Q x N s.t. (q,2,q',c) € ó; it is determin- 
istic if exactly one such pair (g',c) exists. Wlog. we assume that automata are 
complete. An automaton is co-Btichi if only colors 1 and 2 occur in 6, and then 
we call the transitions with color 1 rejecting and those with color 2 accepting. 

A run of A on a word w = xox... € X" is a sequence 7 = 7971... € QY 
starting in To = qo and such that (75, x;,7;41, cj) € 6 for some c; for every i € N; 
the induced color sequence c = coc, ... is uniquely defined by w and m. A run 
is accepting if the lowest color occurring infinitely often in the induced color 
sequence is even (“min-even acceptance"). When this minimal color is uniquely 
defined, e.g. when there is only one accepting run, it is called the color of w 
wrt. A. A word is accepted if it has an accepting run. The automaton's language 
L(A) is the set of accepted words. The language of the automaton A’ derived 
from A by changing the initial state to q is denoted by L(A, q). 

A co-Büchi language is a language representable by a nondeterministic (equiv., 
deterministic) co-Büchi automaton. The Co-Büchi languages are a strict subset 
of the omega-regular languages. 

An automaton is good-for-games if there exists a strategy f : X* — Q to 
resolve the nondeterminism to produce accepting runs on the accepted words, 
formally: for every infinite word w = zozi..., the sequence 797... defined by 
Ti = f (£o... zi—1) for all à € N is a run, and it is accepting whenever w belongs 
to the language. 


Games and our realizability problem 


LTL. A commonly used formalism to represent system specifications is Linear 
Temporal Logic (LTL, [29]). It uses temporal operators U, X, and derived ones 
G and F, which we do not define here. For details, we refer the reader to [27]. 


Games. An edge-labelled game is a tuple G = (AP;,APo, V, vo, 6, obj) where 
V is a finite set of vertices, vg € V is initial, 6 : V x 2^P: x 2^Po — V is 
a partial function describing possible moves (safety specification), and obj is a 
winning objective (liveness specification). A play is a maximal (finite or infinite) 
sequence of transitions of the form (vo, 19, 00, v1) (v1, 41, 01, v2) (v9, 12, 02, v3) . . -; 
the corresponding sequence (io U oo)(i1 U 01)... is called the action sequence. 
An infinite play is winning for the system if it satisfies the objective obj; when 
obj is an LTL objective over AP; U APo, the infinite play satisfies obj iff the 
action sequence satisfies it. A system strategy is a function f : (2^P1)* — 2^Po, 
The game is won by the system if it has a strategy f such that every play 
(vo, io, 00, V1) (v1, 41, 01, V2)... is infinite and it satisfies the objective, where o; = 
f (io . . . 4j) for all j. To define parity games, the winning objective obj is set to be 
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a parity-assigning function obj : V — N, and then an infinite play satisfies obj 
iff the minimal parity visited infinitely often in the sequence obj(vo)obj (v1) . .. 
is even (min-even acceptance on states). 

The enforceable predecessor operator OQ reads a set of tuples 6 C 2^P x V 
and returns the set of positions from which the system can enforce taking one 
of the transitions into the destination set: 


Do($)- {v € V |Vvido:(iUo,9(v,1,0)) € P} (1) 


Symbolic games with LTL objectives. Games can be represented symbolically. 
For instance, the vertices can be encoded as valuations of Boolean variables AP, 
and transitions between the vertices can be encoded using a Boolean formula. 
This paper focuses on solving symbolic games with LTL objectives: 


Given a symbolic game with LTL objective. Who wins the game? 


The particular symbolic representation is not important as long as it provides 
the operations for union, intersection, and complementation of sets of label- 
position tuples, and the enforceable predecessor operator <. This paper focuses 
exclusively on the realizability problem; the extraction of compact and efficient 
implementations merits a separate study. 


Mu-calculus fixpoint formulas. For an introduction to using fixpoint formulas in 
synthesis, we refer the reader to [7], and to [10,5] for mu-calculus in general. The 
fixpoint formulas use the greatest (v) and least (u) fixpoint operators, and the 
enforceable-predecessor operator QQ. For instance, the formula vY.uX.[]1O(Y ^ 
(z V X)) represents the biggest set of vertices such that from all vertices in the 
set, the system can enforce that either x does not hold along the next transition 
and this transition leads back to the same set, or the play gets closer to a position 
from which this can be enforced. This formula hence characterizes the positions 
from which the system can enforce that x holds infinitely often along a play. 


Generalized Reactivity(1) 


Generalized Reactivity(1) is a class of assume-guarantee specifications that in- 
cludes safety and liveness components. It gained popularity because many spec- 
ifications naturally fall into the GR(1) class, and the restricted nature of GR(1) 
admits an efficient synthesis approach. For the purpose of this paper, we define 
a GR(1) specification as a game Ger = (APr, APo, V, vo, 0, 9) with an LTL win- 
ning objective of the form $ = A7", GFa; — A ., GFg,, where each assumption 
a; and guarantee g; are Boolean formulas over AP; U APo. The original GR(1) 
specification class [28] uses logical formulas to describe the symbolic arena. 


Solving GR(1) games using fixpoints 


We now show how to solve GR(1) games by evaluating fixpoint formulas on 
GR(1) game arenas. Consider a GR(1) game Ggr1 = (APr, APo, V, vo, ô, $) with 
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$ = ji, GFa; > Aj.,GFg;. The set of positions W C V from which the 
system player wins the game is characterized by the fixpoint equation [18,8]: 


W —vz. À uY. V IXLOS [la ^ Z) V Y v (cai ^ X)] (2) 


j=l i=1 


This fixpoint formula ensures that the system chooses to move into states of 
one of the three kinds: (1) states where it waits for an environment goal a; to 
be reached, possibly forever (=a; ^ X), (2) states that move the system closer 
to reaching its goal number j (Y), or (3) winning states that satisfy system 
goal number j (g; ^ Z). The conjunction over all guarantees to the right of 
vZ ensures that all liveness guarantees are satisfied from all winning positions 
(unless some environment liveness assumption is violated). The disjunction over 
the environment goals permits the system to wait for the satisfaction of any of 
the environment liveness goals. At the end of evaluating the fixpoint formula, 
Z consists of the winning positions for the system. The system wins the GR(1) 
game if and only if W includes vo. 


Example. Consider a GR(1) game with AP; = {u}, APo = {x,y}, and & = 
GFu — (GFa ^ GFy). Equation 2 becomes: 


wYVX.OO(@Z v Y v aX) A 


WH Ve) p YvXpotyZ v Y V aX) (3) 


For conciseness, we write xZ instead of x ^ Z, and a instead of ~a. 


Solving symbolic parity games using fixpoints 


Consider a parity game (APr, APo, V, vo, 6, c) with colors (0,..., n]. The winning 
positions for the system player in such game are characterized by the fixpoint 
formula from [33,11] adapted to our setting: 


W —vX?uX! ...oX".(D0(V? ,color; ^ X*) (4) 


The operators v and p alternate, so the symbol ø is p if n is odd and v if n is 
even; color; = (v | c(v) = i} denotes the set of vertices of color i. 


Solving symbolic LTL games using fixpoints 


Let G be a game with LTL objective 9. We can construct a deterministic parity 
automaton A for &, build the product parity game G & A, and solve it with the 
help of Equation 4. An alternative approach is to embed the product into the 
fixpoint formula by using vector notation [10]. 

Consider an example. Let G = (AP;, APo, V, vo, ô, 9) be a game with @ = 
GFu — (GFa ^ GFy). The parity automaton for is shown on Figure 1. It has 
two states, qo and q1, and uses three colors. For three colors, the parity fixpoint 
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Fig. 1. Parity automaton for GFu — (GFzx ^ GFy). Transitions are labeled by the 
proposition valuations for which they can be taken as well as the color of the transition. 


formula in Equation 4 has structure vZ.uY.v X. We index each set variable with 
the state of the automaton, thus Z is split into Zp and Zi, etc. The formula is: 


Wo a Zo Yo Xo IZ V zuYo V TUXo 
ml = a e joia yan yma, © 


The top row encodes the transitions from state qo of the parity automaton: 
qo zm qı becomes zZi, qo ss qı becomes zuY;, qo ur qo becomes züXo. After 
formula evaluation, the variable Wo contains game positions winning for the 
system wrt. the parity automaton A,,, while W; does so wrt. Aq,- 

In general, suppose we are given a game whose winning objective is a deter- 
ministic parity automaton (2^P, Q, qo, ô) with transition function ô : Q x X > 
Q x N that uses n colors (0,...,n — 1}. The set of winning game positions is 
characterized by the fixpoint formula: 


Wi xX? Xi Xr Vi 
2 | =p A "m À iUd ah oO] : (6) 
Mal Xo Pia Xa Vgl 
where for all j € {1,...,|Q|}, we have Y; = V X^ XE aa) 
x E QAP 
let (q, c) Js (qj. x) 
where ind: Q — (1,...,|Q|] is some state numbering (one-to-one) that maps 


the initial automaton state qo to 1. The game is won by the system if and only 
if the initial game position belongs to W1. 


3 Chains of Good-for-Games co-Büchi Automata 


This section reviews the chain of good-for-games co-Büchi automata represen- 
tation [19] for w-regular languages used by our synthesis approach in Section 4. 
Like parity automata, a chain of co-Büchi automaton representation of a 
language assigns colors to words. The central difference is that the chain repre- 
sentation relies on a sequence of automata, each taking care of a single color. 


Definition 1. Let L C X* be an omega-regular language. A falling chain of 
languages L4 D L3 D ... D Ln is a chain-of-co-Büchi representation of L if 


— every language L; for i € {1,...,n} is a co-Büchi language, and 
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— for every w € X", the word w is in L if and only if w € Ly or the highest 
index i such that w € Lj is even. 


Examples. The universal language X" has the singleton-chain Lı = (), and the 
empty language has the chain (Lı = ©”) D (Lə = 9). The language of the LTL 
formula GFa over a single atomic proposition a is expressed by the chain (L4 = 
L(FGa)) D (Ly = 0), and L(FGa) by (L4 = ©”) D (Lg = L(FGa)) D (L; = 0). 
The definition of the natural color of a word from [19] provides a canonical 
way to represent L as a chain of co-Büchi languages Lı D L3 D ... D Ln, which 
uses the minimal number of colors. Moreover, Abu Radi and Kupferman describe 
a procedure to construct a minimal and canonical good-for-games co-Büchi au- 
tomaton for a given co-Büchi language [2]. Thus, every omega-regular language 
has a canonical minimal chain-of-co-Büchi-automata representation ( COCOA). 
The canonization procedure in [2, Thm.4.7] ensures the following property. 


Lemma 1 ([2]). Fiz a canonical GFG co-Büchi automaton A computed by [2, 
Thm.4.7]. For every state q and letter x, either there is 


— exactly one accepting transition, or there are 
— one or more rejecting transitions. In this case: 
e all successors of q on x share the same suffix language L’, i.e., for every 
two successors sı and s2 of q on x: L(A, s1) = L(A, s2), and 
e for every state q' with suffix language L’, there is a rejecting transition 
to q' from q on z. 


Figure 2 on page 12 shows an example of à COCOA. 


Strategies to get back on the track 


Every GFG automaton has a strategy to resolve its nondeterminism such that 
a word is accepted if and only if the run adhering to this strategy is accepting. 
We allow such strategies to diverge for a finite number of steps, and show that 
this divergence does not affect the acceptance by canonical GFG automata. 

Given a COCOA A!,..., A”, define the natural color of a word to be the 
largest level | such that A! accepts the word, or 0 if no such l exists. Thus, a 
word is accepted by the COCOA if and only if the natural color is even. 


GFGness strategies f'. Let f! : X* > Q! be a GFG witness resolving nondeter- 
minism in A’, for every | € {1,...,n}; we call f! a golden strategy of A’, and 
the induced run for some given word is called its golden run. 


Restrictions g'. The synthesis approach, which will be described later, considers 
combined runs of all automata. Its efficiency depends on the number of reachable 
states in Q! x ... x Q", so it is beneficial to reduce this number. To this end, 
we introduce a restriction on successor choices. We first define a helpful notion: 
for a co-Büchi automaton A and its state q, let L°°°(q) denote the set of words 
which have a run from q visiting only accepting transitions. For several automata 
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Al,...,A! and their states g1,...,q', define L^**(ql,..., q') = A, L°: (qf). Then, 
for | € {1,...,n}, define a restriction function g! : Q! x X x Q! x... x QUI 5 
22": for every qt, qal. coop let g'(q!,x,r!,..., r1) = S C (gv) be 
a maximal set such that for every r! € S there exists no other 7 € S with 
Le (rt... rl 71, 8) C L2¢(r},...,r!). Intuitively, given a current state q! of 
the automaton A’, a letter x, and successor states r!,...,r/7! of the automata 
on lower levels, the function g returns a set of states among which A! should 
pick a successor. Runs p! = qdqi...,..., p" = Bq? ... of Al,...,A” on a word 
xox... satisfy restrictions gl,...,g" if for every level | € {1,...,n} and step 
icN: goi € g’ (dl, £i, dlya; 4 aborde Strategies f! : X* + Q! for 1 € {1,...,n} 
satisfy restrictions g',...,g” if on every word the strategies yield runs satisfying 
the restrictions. 

The following lemma states that requiring runs of A!,..., A” to satisfy the 
restrictions g!,...,g” preserves the natural colors and the GFGness. 


Lemma 2. There exist strategies f! : X* > Q! forle (1,...,n) satisfying the 
restrictions g',...,g" such that for every word of a natural color c, the strategies 
yield accepting runs pl,..., p° of A',..., A®. 


Proof. Fix a word w of a natural color c. Each automaton A! of the chain has a 
GFG witness in the form of a strategy h! : X* — Q! to resolve nondeterminism. 
From such strategies and the restrictions gl,...,g", we construct the sought 
strategies f!,..., f", inductively on the level, starting from the smallest level 1 
and proceeding upwards to n. 

Fix | € {1,...,n}, and suppose the strategies f!,..., f'^! are already de- 
fined; we define the strategy f! : X* — Q!. Fix a moment i — 1. Let ql. , be the 
state of the run p! proceeding according to f’, d! = A'(xo .. 2j 1) the successor 
state in the original run f! according to ht, q},... ui the successor states in 
pl,...,pl-! adhering to f!,..., ft, and Ql = gl(g! ,,2; 1,91, ..., a. .) the 
allowed successors on level l. Then: 


— if QL = (q!) describes a unique choice, then f'(zo... 2; 1) = ql takes it, 

— else f! picks any dl € Q! s.t. L° (ql,..., git, ql) D L^**(gl,..., qt, @). 
Note that such q} always exists because in canonical GFG co-Büchi automata 
a choice of a nondeterministic transition does not narrow the subsequent 
nondeterminism resolution. 


We now show that the strategies f!,..., f! preserve the natural colors. Fix a 
word w. It suffices to prove that the original strategy h! yields an accepting run 
p' if and only if f! yields an accepting run pl. If ! is rejecting, then p! is also 
rejecting, for h! is a witness of GFGness. Now assume that f! is accepting. After 
some moment m, the runs pt, .. . , p'-! , ! never make a rejecting transition, hence 
WmWmsi... € Le (ql, ..., ql, dl; ). Let m' > m be the first moment after m 
when p! visits a rejecting transition; if no such m’ exists, we are done. At moment 
m', the strategy f! picks a successor qL,,,, such that L"**(ql,,,,..., 01,44) 2 


z ; f= 2 
L algi ees Gay) Since wj... € PM UIT TER EN HUNE ü uh that 
suffix also belongs to a larger L^** wrt. q}, 4,1: Hence the run p! is accepting. 


92 R. Ehlers and A. Khalimov 


Get-back strategies fl. We now consider runs that diverge from golden runs. 
Given an individual strategy f! : X* — Q!, define f! : X* x Q! x X — Q! to be 
a strategy-like function which, when presented with a choice, makes the same 
choice as f!. Formally: for every p € X*, q € Q! reachable from the initial state 
on reading p, and x € X, the value f!(p,q,a) = f'(p- x) if A needs to take a 
rejecting transition from q on æ, otherwise there is no choice to be made and 
f'(p,q,2) = q' for the unique successor q’ of q on reading x. It follows from 
properties of canonical GFG automata (Lemma 1) that every successor chosen 
by fl satisfies the transition relation of .A'. We now prove that it is sufficent to 
adhere to f! only eventually. 


Lemma 3. Fir a COCOA and a word w. For l€ {1,...,n}, suppose A! on w 
has a rejecting run p! that eventually adheres to f!, where f! is constructed from 
f! of Lemma 2. Then A! rejects w. 


'The proof is based on Lemma 1, which implies that two diverging runs of a 
canonical GFG automaton on the same word can always be converged once a 
rejecting transition is taken. 


Proof. For | = 0 the claim trivially holds; assume | > 0. Let p} be the golden 
run of A! on the word. Let m be the moment starting from which p’ adheres 
to the golden strategy of A!. Let n be the first moment n > m when A! makes 
a rejecting transition: by properties of canonical GFG automata (Lemma 1), 
there must be a rejecting transition to the same state as in pl. The strategy f! 
moves the automaton A! in p! into the same state at moment n + 1 as it is in 
pl. Afterwards, the strategy f! ensures that A! in p! follows exactly the same 
transitions as A! in pl. Hence, the golden run ø} is rejecting: A! rejects w. 


COCOA product 


In this section, we compose individual automata of COCOA into a product which 
is a good-for-games alternating parity automaton [9]. The results above imply 
that the languages of a COCOA and its product coincide. Later we use COCOA 
products to solve games with LTL objectives. 


Alternating automata. A simplelalternating parity automaton (X,Q,qo,ó) has 
a transition function of type 6: Q x X — 29 x N x (rej, acc}. For instance, 
(q, x) = (qm, qo], 1, rej) means that from state q on reading letter x there are 
transitions to qı and q2, both labelled with color 1, and the choice between qı 
and q2 is controlled by the rejector player. There are two players, rejector and 
acceptor, and the acceptance of a word w = x21... is defined via the following 
word-checking game. Starting in qo, the two players resolve nondeterminism and 
build a play (qo, co, plo, q1)(q1, C1, pli, q2) .. -: suppose the play sequence is in state 


1 ‘Simple’ refers to a simpler form of the transition function. We use ô : Q x X > 
29 x N x (rej, acc} while the general form is 5 : Q x X — B*(Q) plus parity 
assignment Q x X x Q > N. We forbid mixing conjunctions and disjunctions. 
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qi, let (qi, xi) = (Qiii, ci pli): if pl; = rej then the rejector chooses a state 
di41 € Qi+1, otherwise the acceptor chooses. The play sequence is then extended 
by (qi, ci, pli, qi41) and the procedure repeats from state q;41. The play is won 
by the acceptor if the minimal color appearing infinitely often in coc, ... is even 
(min-even acceptance), otherwise it is won by the rejector. The word-checking 
game is won by the acceptor if it has a strategy fw : Q* — Q to resolve its 
nondeterminism to win every play; otherwise the game is won by the rejector, 
who then also has a winning strategy. Note that although the acceptor strategy 
does not know the rejector choices beforehand, it knows the word w. The word 
is accepted by the automaton if the word-checking game is won by the acceptor. 

A simple alternating automaton is good-for-games, abbreviated A-GFG, if 
the acceptor player has a strategy face : (Q x X)* — Q to win the word- 
checking game for every accepting word, and the rejector player has a strategy 
frei : (Q x X)* — Q winning for every rejected word. These strategies depend 
only on the currently seen word prefix, not the whole word. We remark that our 
definition of GFGness differs from [9] but they show the equivalence [9, Thm.8]. 


COCOA product. The product is built in three steps. First, we define a naive 
product, which combines individual chain automata into A-GFG in a straightfor- 
ward way. The naive product may contain states whose removal does not affect 
its language, hence in the second step we define a product with reduced sets of 
states and transitions. In turn, the reduced product may miss transitions ben- 
eficial for synthesis. Therefore, in the last step, we enrich the reduced product 
with transitions to derive the optimized, and final, COCOA product. 

Given a COCOA A! = (X, Q!, qb, ô!) with l € {1,...,n}, the naive COCOA 
product is the following simple alternating parity automaton (X, Q,q?, 8). Each 
state is a tuple from Q! x ... x Q”, q? = (q,..-,q%), and the set of states 
consists of those reachable from the initial state under the transition relation 
defined next. The transition relation ô : Q x X — 29 x N x (rej, acc} simulates 
individual automata of the COCOA. Consider an arbitrary (q',...,q") € Q, 
x € X; let r be the smallest number such that A” has a rejecting transition from 
q7 on reading z, i.e., (q",z, 4^, 1) € 6” for some d" € Q”, otherwise set r to n 4- 1. 
By abuse of notation, define 6!(q', x) = (d! | 3p : (q', x, d, p) € 9!) to be the set 
of successor states of q! on reading x in A!. Let pl" be rej for odd r and acc for 
even r. Then, 6((q',...,q”),x) = (Q,r — 1, pl”), where: 


Q = ((4^.....3") | d € HG, 2) for every 1). 


Notice that the automata on levels | < r have unique successors (q! is unique) as 
their transitions are accepting and hence deterministic (by Lemma 1 on page 8). 
The automata on levels | > r may need to resolve nondeterminism, which is 
done by a single player pl” in the product. 

'The reduced COCOA product is defined by replacing the definition of Q by 


Q Es Td ses T | q! € GG excel Sdn for every I} 


where the restriction function g! was defined on page 9. As a result, this set Q has 
no two states (0 70s 297) and [9 2:0) with Le las as og) LG as cs 


94 R. Ehlers and A. Khalimov 


c TNU 
Oo aon 
y yvu 

l J l I 


` - ` 


T A! y rVu A? y Vu 
FGz v FGy FGzru V FGgü 


Fig. 2. COCOA for the language GFu — (GFrAGFy). Rejecting transitions are dashed. 


q”). The set of states of the reduced COCOA product is the set of states from 
Q! x... x Q” reachable under the above definition. 

Finally, given a reduced COCOA product (X, Q, q?,65), we now define the 
optimized COCOA product (X, Q, q?, ôo). It has the same states Q as the reduced 
product but adds transitions. For (q',...,q") € Q, x € X, let (Qg,r — 1, pl") = 
ón((gl, ..., q^), £). Then óo((q., ...,q?),2) = (Q,r — 1, pl"), where 


Q- QnaU((d.....d") €Q: 
Vl € {1,...,r—1}: q! e ó' (g', £) ^ 
Vl € {r,...,n}.aGh, ..., 48) € Qu: L(G) = L(Gh)}. 


In the first condition, the successor q! for | < r—1 is uniquely defined. The second 
condition on levels higher than r — 1 allows for state jumping. 


Lemma 4. For every COCOA, the optimized product is A-GFG and has the 
same language as the COCOA. 


Proof. We describe two strategies, face : (Q x X)” — Q for the acceptor and 
frei : (Q x X)* 2 Q for the rejector, and prove two claims: for every word, 
(1) if the word is accepted by COCOA, the acceptor wins the word-checking 
game using facc, (2) if the word is rejected by COCOA, the rejector wins the 
word-checking game using frej: The lemma follows from these claims. 

We define face: Given a finite history h = ((q}, ..., qt), £1)---((qł, ... q?), Vi); 
let facc(h) = (do oso tia) where for l= 1,..., n: 


— if l is even: dlya = f nite dg £i); 
— if l is odd, pick arbitrary q!,, € g'(q1,,,..., dyi, ql). 


The strategy frej is built similarly but f 1 is used for odd l. Finally, the two items 
are then proven using contraposition and then applying Lemma 3. 


Example. Figure 3 shows the optimized product for COCOA in Figure 2. 


4 Solving LTL Games Using Chain of co-Büchi Automata 


This section shows how to solve symbolic games with LTL objectives by going 
through COCOA. For a given LTL specification we construct a deterministic 
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Fig. 3. Optimized COCOA product for GFu — (GFz ^ GFy). It has only two nonde- 
terministic transitions, connecting (qo, po) and (q1, pi), controlled by the rejector. For 


instance, ó((go, po), x) = ({ (q0, po), (q1, p1)}, 0, rej). 


parity automaton and then a COCOA using the effective procedure of [19]. We 
then compute the COCOA product. Finally, we encode the symbolic game with 
a COCOA product objective into a fixpoint formula. The latter step is simple 
because the COCOA product is a good-for-games alternating automaton, and 
such automata are composable with games [9, Thm.8]. Finally, we show that the 
GR(1) fixpoint equation is a special case of the COCOA fixpoint formula. 


Fixpoint formula for games with COCOA objectives 


Given a game with an objective in the form of an optimized COCOA product 
(2^P Q, qo, 6), we construct a fixpoint formula that characterizes the set of win- 
ning positions. Since the COCOA product is a good-for-games parity automaton, 
the formula resembles Equation 6. It has the structure v X9.uX1.... 0 X" where 
n + 1 is the number of colors in the COCOA product, and the operators v and 
u alternate. As before, we use the vector notation, and split each variable X' 
into |Q| variables X!,... , Xiop one per state of the COCOA product, and the 
kth row in the fixpoint formula encodes transitions from state q; of the product. 
Let ind: Q > {1,...,|Q|} be some one-to-one state numbering with the initial 
state of the COCOA product mapped to 1, and let OP”! denote V when pl is 
acc otherwise it is A. The following fixpoint formula computes, for each state q 
of the COCOA product, the set Wina(q) of game positions from which the system 
player wins the game wrt. the COCOA product whose initial state is set to q: 


W; Xi Xi XT V 
: =v| : LB] : -o| : |.0O] : |, where for all j: (7) 
0 1 n 
Wigi Xal LX XQ) ial 
ENS pl c 
w= V (AOP Xiao) 
x e 2AP 


let (Q, c, pl) = ó(qj, x) 


The game wrt. the COCOA product is won by the system player if and only 
if up € W1. Since the languages of COCOA and its optimized product coincide 
(Lemma 4), we arrive at the following theorem. 
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Theorem 1. A game with an LTL objective 9 is won by the system if and only if 
the initial game position belongs to Wi computed by Equation 7 for the optimized 
COCOA product for o. 


Example. Consider the LTL specification GFu — (GFa ^ GFy). The optimized 
product contains only states (qo, po) and (q1, pi). The fixpoint formula is: 


y |20] | Yoo} ,, | Xoo tLZooZ11 V ZuYoo V TUX 
` í i UZooZii V guYu V gyuXi 


where the subscript index ij denotes a state (q;,p;) of the optimized COCOA 
product. The LTL game is won by the system if and only if at the end of eval- 
uation the initial game position v belongs to Zoo. This formula has a structure 
similar to the GR(1) Equation 3, in particular it uses the conjunction over Z 
variables which leads to a reduction of the number of fixpoint iterations. In 
contrast, the parity formula in Equation 5 misses this acceleration. 


GR(1) synthesis as a special case 


We argue that for GR(1) specifications, the COCOA fixpoint Equation 7 be- 
comes similar — in spirit — to GR(1) fixpoint Equation 2. Consider a GR(1) 
formula A; GFa; > Aj- GFg;. Its COCOA has two automata, A’ and A’. 
The automaton A! accepts exactly the words that violate one of the guarantees, 
while A? accepts exactly the words that violate one of the guarantees and one 
of the assumptions. In order to reason able number of states in canonical au- 
tomata, we assume henceforth that in the GR(1) formula, no assumption implies 
another assumption or guarantee, and no guarantee implies another guarantee. 
The structures of A! and A? are as follows. The automaton A! has one state per 
guarantee (n in total), while .A? has one per combination of liveness assumption 
and guarantee (m -n in total). The optimized COCOA product has exactly one 
state for each assumption-guarantee combination, m-n in total, versus n- (m-n) 
for the non-optimized product. Let (1,...,m] x {1,...,n} be the states of the 
optimized product, and let (1,1) be initial. For each state (i, 7): 


~ for every x = aigj: 0((, j), x) = (£53) 2, rej), 
- for every z = a4gj: 9((53), x) = (9,3) | i! € {1,..., m}}, 1, acc), and 
— for every x = gj: 9((i, j), £) = (11, ...,m}x{1,...,n},0, rej). 


The fixpoint formula for such COCOA product has the form: 


Wii Zii Yii Xii pia 
: : : QAS , where for all i, j: 
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The conjunction A, ;, Z;, ; and disjunctions V ;, Yy,; enable faster information 
propagation which results in smaller number of fixpoint iterations. Such infor- 
mation sharing is present in GR(1) fixpoint Equation 2, and it is in this sense the 
COCOA approach generalizes GR(1) approach. In contrast, the parity fixpoint 
formula for GR(1) specifications misses this acceleration. 

We now optimize the equation to reduce the number of variables. First, we 
introduce variables Y; and Z;, for j € {1,...,n}, and transform the formula into 


WA Zi Yi V; Pia 
MEI ELSE BENI : , where 
Pia X13 113 
: =v 0 BRDEO| : |, where 
Pm,n Xm,n Win 
big =g NZ) V ag;Y; V ügiXi 
j'€(1,...,m) 


Note that for every i € {1,...,m}, the value W; į computed by the old formula 
equals the value W; computed by the new formula (W;; = W;), where j € 
{1,...,n}. We then introduce a fresh variable Z, and transform the formula to: 


W —vZ. n V;, where 


j€(1,...,n) 
V, Yı V; Pia 
| =yuļ|:]. : , where 
Pia X14 NZ V any V agi Xia 
: =v : DO : 
Py in Xm,n InZ V QmgnYa V ämJnXm,n 


After this transformation, we have W = W, for every j € {1,...,n}. Finally, 
the last equations can be folded into the formula 


W —vZ. A uY. N vX.00 [gZ V agY v aig;X] 
j=l i=1 


which is equal to Equation 2 modulo expressions in front of the variables. Our 
prototype tool implements a generalized version of such formula optimization. 


5 Evaluation 


Evaluation goals are: (G1) show that standard LTL synthesizers do not fit our 
synthesis problem, (G2) compare our approach against specialized GR(1) syn- 
thesizer, and (G3) compare the COCOA approach against the parity approach. 
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We implemented COCOA and parity approaches in a prototype tool reboot. 
It uses SPOT [16] to convert LTL specifications (the liveness part of GR(1)) 
to deterministic parity automata. From it, reboot builds COCOA using the 
construction described in [19]. The COCOA is then compiled into a fixpoint for- 
mula in Equation 7, and symbolically evaluated on the game graph. For symbolic 
encoding of game positions and transitions, we use the BDD library CUDD [32]. 

We compare our approaches with GR(1) synthesis tool slugs [18] and the 
LTL synthesis tool strix [26] which represent the state of the art. The experi- 
ments were performed on a Linux machine with AMD EPYC 7502 processor; the 
timeout was set to 1 hour. To implement the comparison, we collected existing 
and created new benchmarks: AMBA, lift, and robot on a grid. Each specifica- 
tion is written in an extension of the slugs format: it encodes a symbolic game 
graph using logical formulas over system and environment propositions, and an 
LTL property on top of it. In total, there are 80 benchmarks, all realizable. 


The evaluation data is available at https: / /doi.org/10.5281 /zenodo.10448487 


AMBA and lift. We use two parameterized benchmarks inspired by [8], each 
having two versions, a GR(1) and an LTL version. The first specification en- 
codes an elevator behaviour and is parameterized by the number of floors. Its 
GR(1) specification has one liveness assumption and a parameterized number 
of guarantees (GF — A, GF). Lifts LTL version adds an additonal request- 
response assumption and has the form GF A (GF — GF) — A; GF, which requires 
5 parity colors. There are 24 GR(1) instances and 21 LTL instances, with the 
number of Boolean propositions ranging from 7 to 34. The AMBA specification 
describes the behaviour of an industrial on-chip bus arbiter serving a param- 
eterized number of clients. Its GR(1) version has the shape GF — A, GF; our 
new LTL modification replaces one safety guarantee y by FGy, which allows 
the system to violate it during some initial phase, and we add an assumption 
of the form GF — GF. Overall, the AMBA's LTL specification has the form 
GF A (GF — GF) + FGA A, GF, and requires 7 parity colors. There are 14 GR(1) 
instances and 7 LTL instances; the number of Boolean propositions is 22 for the 
specification serving two clients, and 77 for the 15-client version. 


Robot on a grid. This benchmark describes the standard scenario from robotics 
domain: a robot moves on a grid, there are walls, doors, pickup and delivery 
locations, and a moving obstacle. When requested, the robot has to pickup a 
package and deliver it to the target location, while avoiding collisions with the 
walls and the obstacle and passing through the doors only when they are open. 
The GR(1) specification has parameterized number of assumptions and guaran- 
tees: A; GF — A; GF. The LTL version introduces preferential paths: the robot 
has to eventually always use it assuming that the moving obstacle only moves 
along her preferred path. This yields the shape FG A A; GF + FGA A, GF (5 
colors). There are 16 maps of size 8x16 with varying number of delivery-pickup 
locations and doors. The number of Boolean propositions ranges from 24 to 53. 
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Fig. 4. From left to right: (G1) Cactus plot comparing our approaches with LTL syn- 
thesizer strix [26]; (G2a) Comparing COCOA-based approach with GR(1) synthesizer 
slugs [17]; (G2b) The same but excluding LTL-to-parity translation time; (G3) Com- 
paring COCOA and parity approaches (excluding LTL-to-parity translation time). 


G1: Comparing with LTL synthesizer. Figure 4 shows a cactus plot. On these 
problems, the LTL synthesizer strix is slower than specialized solvers. The rea- 
son is the sheer number of states in benchmark game arenas: e.g., benchmark 
amba15 uses 77 Boolean propositions, yielding the naive estimate of game arena 
size in 277 states. Solver strix tries to construct an explicit-state automaton 
describing this game arena and the LTL property, which is a bottleneck. In con- 
trast, symbolic solvers like slugs or reboot represent game arenas symbolically 
using BDDs, and reboot constructs explicit automata only for LTL properties. 


G2: Comparing with GR(1) synthesizer. The second diagram in Figure 4 com- 
pares the COCOA approach with slugs on the GR(1) benchmarks. The diagram 
shows the total solving time, including the time reboot spends calling SPOT for 
translating GR(1) liveness formula to parity automaton. On Lift examples, most 
of the time is spent in this translation when the number of floors exceeds 15: 
for instance, on benchmark lift20 reboot spent 650 out of total 670 seconds in 
translation. If we count only the time spent in fixpoint evaluation — and that is a 
more appropriate measure since GR(1) liveness formulas have a fixed structure 
— the performances are comparable, see the third diagram. 


G3: COCOA vs. parity. The last diagram in Figure 4 compares COCOA and 
parity approaches on all the benchmarks, and shows that the COCOA approach 
is significantly faster than the parity one. We note that on these examples, the 
number of states in the optimized COCOA product was equal to or less than 
the number of states in the parity automaton. At the same time, the number of 
fixpoint iterations performed by the COCOA approach was always significantly 
smaller than for the parity one. Intuitively, this is due to the structure of COCOA 
fixpoint equation that propagates information faster than the parity one. 


Remarks. We did not compare with other symbolic approaches for solving parity 
or Rabin games [15,14,6]: although they use symbolic algorithms, as input these 
tools require games in explicit form or their game encoding separates positions 
into those of player-1 and player-2; both significantly affects the performance. 

While all our benchmarks were realizable, the prototype tool was system- 
atically compared against other approaches on both realizable and unrealizable 
random specifications using fuzz testing. 
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Abstract. We present an innovative approach to the reactive synthesis 
of parity automaton specifications, which plays a pivotal role in the 
synthesis of linear temporal logic. We find that our method efficiently 
solves the SYNTCOMP synthesis competition benchmarks for parity 
automata from LTL specifications, solving all 288 models in under a 
minute. We therefore direct our attention to optimizing the circuit size 
and propose several methods to reduce the size of the constructed circuits: 
(1) leveraging different parity game solvers, (2) applying bisimulation 
minimisation to the winning strategy, (3) using alternative encodings from 
the strategy to an and-inverter graph, (4) integrating post-processing with 
the ABC tool. We implement these methods in the Knor tool, which has 
secured us multiple victories in the PGAME track of the SYNTCOMP 
competition. 


Keywords: Reactive synthesis - Parity games - Binary decision diagrams 


1 Introduction 


Reactive synthesis as first stated by Church [8,9] and outlined in [32] is the 
act of automatically constructing a reactive system such that all interactions 
with an unknown environment satisfy a linear temporal logic (LTL) specification. 
While early solutions were proposed to solve the synthesis problem via finite- 
state automata [7], until recently reactive synthesis using deterministic parity 
automata and parity games was deemed infeasible in practice, in part due to 
the lack of efficient translations from LTL to deterministic w-automata. With 
the rise of direct translations, LTL synthesis tools such as ltlsynt [27,33,34] and 
Strix [26] are capable of solving a wide range of specifications via deterministic 
parity automata and parity games, and perform better than some of the previous 
techniques avoiding deterministic parity automata. 

'The advantage of reactive synthesis is that synthesized systems are correct 
by construction and therefore do not need to be tested nor model checked for 
correctness. The reactive synthesis (SYNTCOMP) competition was founded to 
increase the impact of reactive synthesis in industry and improve the quality of 
synthesis tools [22,23]. Motivated by the new PGAME track in the SYNTCOMP 
competition, we seek to use the Oink parity game solver [11] in the competition and 
to implement the necessary infrastructure that translates the parity automata 
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of the competition into parity games suitable for Oink, and that translates 
the winning strategy computed by Oink into a Boolean circuit. We name this 
implementation Knor!. 

Knor leverages Oink to solve parity games with state-of-the-art parity game 
solvers [16], and the Sylvan binary decision diagrams (BDD) package [14] to 
implement most of the steps before and after solving and a purely symbolic parity 
game solver based on [25]. The techniques implemented in Knor have secured us 
multiple victories in the SYNTCOMP competition, in 2021, 2022 and 2023. 

Following initial success of Knor in the competition, we observe a major 
difference with main competitors ltlsynt and Strix. While Knor can solve all 
benchmarks in a remarkably short time, the constructed circuits are sometimes 
several orders of magnitude larger than the circuits constructed by other tools. 
'Thus, we propose several techniques, mostly symbolic techniques that rely on 
binary decision diagrams, to reduce the size of the constructed circuits. 


Contribution. We present the Knor tool that solves the synthesis problem of 
parity automata to Boolean circuits, built around the parity game solver Oink. 
We consider three methods to translate the given parity automaton to a parity 
game, and present a novel symbolic approach that improves upon an explicit 
translation by several orders of magnitude. As Oink implements several parity 
game solvers that have been shown in [16] to perform well for parity games 
derived from reactive synthesis benchmarks, we consider whether changing the 
algorithm impacts the size of the constructed circuit. We study whether applying 
bisimulation minimisation as in [15], which aims to minimize the number of states 
of the winning strategy after solving the parity game, can reduce the size of the 
circuits. Similarly, we study different encodings from the winning strategies into 
Boolean logic, in particular whether a onehot encoding of the states improves 
the circuit size. Finally, we apply a similar post-processing step as Strix by using 
the ABC tool [4,5] to minimize the constructed circuit after encoding it as an 
and-inverter graph. Sec. 3 describes Knor and provides accessible descriptions of 
the implemented techniques. We evaluate these techniques in Sec. 4. We discuss 
our findings in Sec. 5. 


2 Preliminaries 


Given two disjoint sets of Boolean variables J and O representing input and output 
signals, and an w-regular language L of infinite words over the alphabet 2779 
representing a specification, the reactive synthesis problem asks us to construct 
a controller that enforces L. The controller is a function (2! 4oy” x 2! — 2° that 
yields a valuation of the output signals 2° based on a history of input and output 
signals (27 as and the current input signals 27. 

While we are interested in the broader context of the synthesis of reactive 
systems that enforce specifications given in linear temporal logic (LTL), we 


! Knor is the Dutch word for the sound that a pig makes, i.e., “oink”. 
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assume in this paper that L is given as a deterministic parity automaton. LTL 
specifications can be translated to a parity automaton of doubly-exponential size. 

Deterministic parity automata (DPA) are w-regular automata that accept 
w-regular languages. A DPA is a tuple (Q, qo, AP, A, F), where Q is a finite 
set of states, go € Q is the initial state, AP is a set of atomic propositions, 
A C Q x 24? x Q is the transition relation and F: Q — N assigns to each state 
a priority. A run of the automaton is an infinite sequence of states consistent 
with the transition relation. A run is accepting if and only if the maximum 
priority that occurs infinitely often along the run is an even number. We define 
parity automata with priorities on states. Alternatively, priorities can also be on 
transitions. 

A parity game is a DPA with two players Even and Odd, where the set of 
states Q is partitioned into two sets Qo and Qi. In this paper, we refer to the 
states of the parity game as vertices and the transitions of the parity game as 
edges. A run on a parity game is an infinite sequence of vertices where player 
Even decides the next vertex if the current vertex is in Qo, and player Odd if 
it is in Q1. A fundamental result for parity games is that they are memoryless 
determined [18], i.e., each vertex is winning for exactly one player, and both 
players have a positional strategy for each of their winning vertices. 

'To solve the synthesis problem, given a deterministic parity automaton over 
AP = I UO, we construct a parity game by splitting the automaton across I 
and O, letting one player (the environment) choose a valuation of variables in I 
and the other player (the controller) a valuation of variables in O. 

The result of reactive synthesis is a Boolean circuit, structured as an and- 
inverter graph (AIG). An AIG is a directed acyclic graph, featuring terminal 
nodes that denote Boolean inputs (input signals and latches), internal nodes 
representing AND-gates, and edges with complementation for logical negation. 

Binary decision diagrams [6,17] (BDDs) are a well known data structure for 
representing and manipulating Boolean functions. A binary decision diagram is a 
rooted, directed acyclic graph. Its internal nodes represent decisions based on 
the values of Boolean variables, directing the path to one of the two child nodes, 
via the "true" edge (depicted as a solid arrow) and the "false" edge (depicted as a 
dashed arrow). Reaching the terminal node “1” indicates that the represented 
Boolean function evaluates to true for that particular valuation, and reaching 
the “0” node indicates a false evaluation. BDDs are recognized as a canonical 
representation of Boolean functions when they meet two conditions. First, they 
must be ordered; that is, they follow a fixed variable ordering when encountering 
Boolean variables. Second, they must be reduced, meaning that any redundant 
decision nodes with identical successors are eliminated [6]. BDDs can be incredibly 
efficient if a suitable variable ordering is found and the represented set is encoded 
in a way that results in small decision diagrams. 

Multi-terminal binary decision diagrams (MTBDDs) extend BDDs by allowing 
terminal nodes to hold various types of data, not just the Boolean values true 
and false. The MTBDD implementation in Sylvan [14] in particular allows for 
terminal nodes to be labeled by 64-bit values. These labels can represent a wide 
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Fig. 1. Overview of Knor from input file to output file. 


range of data, including 64-bit integers, pointers, floating-point numbers, or even 
pairs of 32-bit values. 


3 Knor 


We study reactive synthesis from parity automata to Boolean circuits in the 
Knor research tool. Knor is written in C++ and is publicly available under a 
permissive license via https:/ /www.github.com/trolando/knor. See Fig. 1 for an 
overview of Knor. All steps of the program are discussed in the following sections. 


3.1 Input format 


Knor reads input files formatted using the extended Hanoi Omega-Automata 
(HOA) format [31]. 

The HOA format [1] is a file format to describe finite-state automata that 
accept sets of infinite words. The automata consist of a finite set of states Q, one or 
more initial states J C Q, a set of atomic propositions AP, and a labeled transition 
relation A C Q x B(AP) x Q, where each transition is labeled with a Boolean 
formula ¢ € B(AP), where we use B(AP) to denote the set of Boolean formulas 
over AP. Furthermore, the HOA format describes an acceptance condition of 
the automaton, i.e., a set of infinite runs of the automaton which are considered 
accepting. For the purposes of the current paper, we are only interested in the 
parity condition, i.e., the automaton is accepting if and only if the lowest /highest 
priority seen infinitely often along the run is even/odd, depending on whether 
the acceptance condition is min even, min odd, max even or max odd. In the 
HOA format, the priorities are either on states or on transitions. 

The extended HOA format adds a distinction between controllable (output) 
and uncontrollable (input) atomic propositions [31]. 
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Fig. 2. Splitting a transition on the parity automaton (left) to construct the parity 
game (right), with priorities on the states (above) or on the transitions (below). We 
depict states by squares, vertices of the environment player by pentagons and vertices 
of the controller player by circles. 


3.2 Output format 


Knor can produce parity games in the standard PGSolver [20] format that is also 
accepted by Oink, as well as Boolean circuits in the AIGER format [3]. 


3.3 Translation from automaton to game 


As described above, the parity automaton consists of a number of states with 
transitions labeled by a Boolean formula, and with the priorities either on the 
transitions or on the states. 

To translate the automaton to a parity game, we need to split every transition 
into two parts. The environment player “moves first” by choosing a valuation of 
the input signals, and the controller player responds by setting output signals such 
that the specification is guaranteed. That is, the output signals are determined 
by the current state and the current input signals. 

We propose three methods to convert the parity automaton to a parity game: 
a naive explicit method, a half-symbolic method and a fully symbolic method. 


(Naive) Explicit method. The explicit method simply creates a parity game 
vertex for every state in the parity automaton, and then splits the transitions 
into two parts as in Fig. 2. 
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For every valuation i of the input signals, we create an intermediate vertex 
that is controlled by the controller player. This intermediate vertex should have 
the least relevant priority, typically 0. For every transition with a label (Boolean 
formula) that is satisfiable for i, we then create an edge from the intermediate 
vertex to the successor of the transition. 

Since we want our parity games to have priorities on the vertices and not on 
the edges, we need to create extra vertices in case the automaton has priorities 
on transitions. This is also shown in Fig. 2. Priorities on the source vertex, 
intermediate vertex, and target vertices should be set to the least relevant 
priority (typically 0) or be ignored by the solver. 

The result is an explicit parity game which Knor directly constructs using 
Oink. The game is then solved with any algorithm implemented by Oink. 


Half-symbolic method. The fully explicit method works reasonably well for 
many of the smaller input models, however some models result in a significant 
exponential blowup of the parity game, as any game with n input signals has 2" 
outgoing edges per source vertex. The extended HOA format actually encodes the 
labels on the transitions symbolically using Boolean formulas, so an exponential 
blowup in some cases can be expected. We propose a method that still results 
in an explicit game constructed using Oink, but that employs binary decision 
diagrams to reduce the number of intermediate vertices and extra transitions in 
the parity game. 

For every state, we produce a multi-terminal binary decision diagram (MTBDD) 
encoding all outgoing transitions, with decision variables representing input sig- 
nals ordered before variables representing output signals, and terminal nodes 
encoding both priority and successor state as a pair of two 32-bit numbers. 

We then collect all subroots of the MTBDD after the input signals, i.e., 
along each path from the root node to a terminal node, we find the first node that 
is either a decision node with a variable of an output signal, or a terminal node. 
For every such node N, we create a corresponding intermediate vertex owned 
by the controller player. The paths leading to N correspond to valuations of 
the input signals that lead to that intermediate vertex, where the controller can 
decide how to respond. We let the controller choose to go to any state (vertex) 
encoded by a terminal node that is reachable from N. For every such terminal 
node, we simply add an edge from the intermediate vertex to the target vertex. 


Fully-symbolic method. While the half-symbolic method already results in a 
major reduction in the size of the parity games, we can go further and encode 
the full transition relation of the parity automaton as a single BDD, which can 
then automatically be interpreted as a symbolic parity game simply by ordering 
variables as follows: 


1. Variables s corresponding to the source state. 
2. Variables į corresponding to input signals. 
3. Variables o corresponding to output signals. 
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4. Variables p and s' corresponding to the priority (either from the transition 
or from the target state) and the target state. 


One can read this BDD intuitively as follows: given some current state (1) 
and some current input values (2), if the controller sets certain output values (3) 
we arrive with some priority at our next state (4). Variables within these four 
groups can be ordered freely; however, we implement a naive approach and have 
not optimized this ordering; this is left as an opportunity for future work. 

Since we encode the entire automaton as a single BDD, states that share 
some transitions can benefit from the automatic reduction offered by BDDs. 

We present a translation from this symbolic parity game to an explicit parity 
game that explicitly uses the structure of the decision diagram to construct the 
game. This procedure consists of the following steps: 


1. We create a state vertex controlled by the environment player for every state 
(with transitions) in the symbolic parity game. These vertices get priority 0. 

2. Along each path in the BDD, we find the first decision node after the input 
signals. We create an intermediate vertex controlled by the controller 
player for every such node. These vertices also get priority 0. 

3. Along each path in the BDD, we find the first decision node after the output 
signals. We decode the priority and the target state and create a priority 
vertex for the environment player with the decoded priority and with a 
single edge to the state vertex corresponding to the target state. 

4. For every state, we compute the reachable decision nodes of step 2 and create 
edges from the state vertices to the intermediate vertices. 

5. For every decision node of step 2, we compute the reachable decision nodes of 
step 3 and create edges from the intermediate vertices to the priority vertices. 


Further improvements to this procedure are possible by considering that 
vertices may share many transitions, and additional vertices could be added 
based on the structure of the BDD. This could reduce the number of edges at 
the cost of more vertices. Furthermore, we do not merge the state vertices and 
priority vertices, which might reduce the number of vertices. This is left as an 
opportunity for future work. 


3.4 Solving the parity game 


Using the procedure described above, we can produce an explicit parity game 
that can be solved by Oink. As shown in [16], several solvers implemented in 
Oink are very efficient for parity games derived from reactive synthesis: 


— strategy iteration (psi) [11,19] 

tangle learning (t1) [10] 

— priority promotion (npp) [2,11] 

— Zielonka's recursive algorithm (z1k) [11,35] 
fixpoint iteration using freezing (fpi) [16] 

— fixpoint iteration using justifications (fpj) [24] 
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We also implement a symbolic solver based on [25]. This symbolic solver 
implements fixpoint iteration with freezing using BDD operations, and operates 
directly on the BDD obtained by the fully-symbolic translation. 


3.5 Post-processing the strategy 


After applying the strategy to the symbolic parity game, we perform two post- 
processing steps. In the case that the strategy does not give all output signals a 
value, we default to setting output signals to false (or 0). We also compute all 
reachable vertices of the parity game from the initial state vertex, restricted to 
the winning strategy, and remove unreachable vertices. 


3.6 Bisimulation minimisation 


'To further reduce the number of vertices of the parity game, we apply bisimulation 
minimisation. Bisimulation minimisation computes equivalence classes of vertices, 
i.e., all vertices that have the same behavior w.r.t. input and output signals. We 
use the signature-based partition refinement approach of [15]. 

Recall that the symbolic parity game is a BDD over the variables s, i, o, p, s' 
as described in Sec. 3.3. We first drop the priority variables p from the BDD, as 
the priorities on the states are not relevant after solving. We reserve fresh BDD 
variables c for the classes, which are ordered after the next state variables, i.e., 
s«ic«oc« s « c. We maintain the current assignment from states to classes 
in a BDD over variables s' and c. The reason for s’ rather than s is that this 
reduces the number of BDD operations. The initial partition assigns all states 
to a single equivalence class. We then repeatedly compute the current signature 
of all states, which is a BDD encoding for every state the classes that can be 
reached and the input/output values to reach them, as follows: 


1. Given a BDD G encoding the symbolic parity game over the variables s, 7, o, s', 
and a BDD P encoding the current partition over the variables s’ and c, we 
compute the BDD S representing the signatures over variables s,i,0,c by 
performing the operation and, exists(G, P, s’). 

2. We use the refine operation of [15] to replace the signatures (over variables 
i,0, c) in S by new classes, reusing previous class identifiers whenever possible, 
and renaming s variables to s' variables on-the-fly, resulting in the next BDD 
P over the variables s’ and c. 

3. We repeat steps 1 and 2 until the number of classes is stable. 


Afterwards, we apply the obtained partition by replacing the states in the symbolic 
parity game by the equivalence classes. 
3.7 Encoding the strategy as a circuit 


'There are several methods to create a Boolean circuit from the solver parity game. 
We first need to encode all reachable states of the parity game as latches in the 
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Fig. 3. Sketch of the encoding from a BDD decision node (left) to three AND-gates 
(right), representing the Boolean formula (F;—1 ^ x) V (^x ^ Fz=0). 


Boolean circuit. We employ two methods for this: (1) one latch per state; and (2) 
one latch per BDD state variable. We call the former method onehot and the 
latter binary; in the first case at all times only a single latch is set, whereas in 
the second case the latches form a binary encoding of the states, similar to how 
they are encoded in the symbolic parity game. As the initial state of a Boolean 
circuit has all latches reset (to 0), we invert the latch that encodes the initial 
state for the onehot encoding and we encode the initial state as state 0 for the 
binary encoding. 

We then compute a BDD F for every latch and for every output signal, where 
F is a BDD over the variables s, (current state and current input signals) such 
that the latch or signal will be set if and only if F evaluates to true. We then 
translate each BDD F to an and-inverter graph. Again we propose two methods 
to achieve this: 


— by using Shannon expansion (ITE) as in Fig. 3 recursively; 

— by first obtaining the irredundant sum-of-products [28] (ISOP) of F in the 
form of a ZBDD [29], which can then directly be translated to an AIG: first all 
products are created, and then the products are connected through inverted 
AND-gates (as ab V cd = —^(—(ab) ^ —^(cd))). 


We thus have four combinations: ITE with binary or onehot encoding and 
ISOP with binary or onehot encoding. Furthermore, we use a cache when creating 
AND-gates to avoid duplicate gates. 


3.8  Post-processing with ABC 


After encoding the strategy as a circuit, we apply optional post-processing of the 
circuit using ABC [5]. 

Similar to Strix, we apply the compress2rs script, which is described in [4]. 
'The compress2rs script performs rewriting, refactoring, balancing, and truth- 
table-based resubstitution. While Strix applies the script until no further im- 
provement is found, we halt when the improvement is less than 2.596. 

We also apply a sequence of three ABC commands, drw, balance and drf, 
which we call the drewrite script here. We apply this script until the improvement 
is less than 196. 
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3.9 Usage of Knor 


Knor expects an eHOA file on standard input; it also accepts a filename as 
a command line parameter instead. With the options -a and -b, Knor writes 
the constructed circuits to standard output as an AIGER file in ASCII or 
binary format respectively. With the option -v, Knor prints timings and other 
information to standard error. 

By default, Knor uses the fully symbolic translation to a parity game. One can 
use --naive for the naive explicit encoding and --explicit for the half-symbolic 
encoding, and --print-game to print the resulting parity game in PGSolver 
format to standard output. Only the fully symbolic translation supports the full 
synthesis pipeline. 

'To choose an explicit-state solver of Oink, one can pick any solver from the list 
obtained with --solvers, in particular the solvers --t1, --npp, --fpi, --fpj, 
--psi. and --zlk. To solve using the symbolic solver, use -- sym. With the option 
--real, Knor will only decide realizability and use tangle learning (--t1) as the 
default solver. The default solver for synthesis is the symbolic solver (--sym). 

Bisimulation minimisation is applied by default, unless the - -no-bisim option 
is used. To encode the circuit, Knor uses by default ITE and onehot encoding. 
To change this one can use the options --isop and --binary. To apply post- 
processing with ABC after constructing the circuits, use the options --compress 
and --drewrite. 


4 Empirical Evaluation 


We present the empirical results here. 


4.1 Benchmarking 


We evaluate the techniques implemented in Knor using the benchmarks of 
SYNTCOMP for the PGAME track that come from reactive synthesis, i.e., 
they are based on LTL specifications in the TLSF file format. In recent years, 
SYNTCOMP has also incorporated benchmarks in the PGAME track that do not 
come from reactive synthesis, such as artificial hard games that are designed to 
be time consuming for specific parity game solvers. Oink can easily handle such 
hard games by using a solver for which no hard game has been designed yet, and 
since our aim is to develop techniques for reactive synthesis specifically, we limit 
ourselves to benchmarks from the TLSF dataset?. We also exclude input files that 
are not parity automata; this removes the aut*.ehoa files, two test*.ehoa files, 
and UnderapproxStrengthenedDemo, which is a Büchi automaton consisting of 
a single state. In total 288 input files remain. 

The benchmarks are run on a machine with an Intel 15-13600KF processor. 
This is a 14-core processor, but we only use a single thread. Knor is compiled using 
gcc version 13.2.1. We repeat benchmarks 5 times and take the median to obtain 


? https:/ /github.com/SYNTCOMP /benchmarks/tree/v2023.4/parity/tlsf based 
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Model explicit half-symbolic symbolic 
amba, decomposed. lock, 15 T.O. 46 24 
amba, decomposed, lock, 14 T.O. 46 24 
amba, decomposed. lock, 13 T.O. 46 24 
TwoCountersDisButA9 T.O. 668,065 7,249 
amba_decomposed_lock_12 402,997,254 46 24 
amba_decomposed_lock_11 100,820,998 46 24 
amba_decomposed_lock_10 25,237,510 46 24 
TwoCountersGui 21,022,475 256 155 
TwoCountersDisButA8 15,254,863 497,310 4,721 
full_arbiter_8 11,287,306 1,669,066 177,690 
amba_decomposed_lock_9 6,323,718 46 24 
amba_decomposed_encode_16 4,981,507 876 330 
TwoCountersDisButA7 3,939,305 98,947 2,365 
TwoCountersDisButA6 3,806,249 101,175 1,733 


Table 1. Sizes in number of vertices of the largest parity games, sorted descending by 
size of parity games constructed using the explicit method. 


Technique Sum of Vertices Time (sec) 
explicit 622,987,565 1,177.91 
half-symbolic 8,491,540 18.28 
symbolic 620,510 11.76 


Table 2. Cumulative size of parity games and time required for construction of the 
parity games of the 284 inputs that could be constructed by all three techniques. 


the runtimes. All experimental scripts and log files are available as [12], and are 
also available online via http: //www.github.com/trolando/knor-experiments. 


4.2 Translating the parity automaton to a parity game 


We first compare the three different techniques to obtain a parity game from the 
parity automaton: explicit, half-symbolic (only symbolic splitting) and fully 
symbolic. 

Of the 288 benchmarks, the explicit method could not construct the parity 
game for four benchmarks within the timeout of 3600 seconds. See Table 1 for the 
largest parity games constructed by the explicit method, as well as the four input 
models for which no parity game could be constructed within 3600 seconds. The 
two other methods could construct the parity games within a reasonable amount 
of time, as is displayed in Table 2. The given time is only the time required for 
constructing the games and excludes time required for parsing the input file, 
which is the same for all methods. 

Clearly, the fully symbolic method is superior to the other methods, both in 
the speed of construction and in the size of the constructed parity games. When 
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Solver Circuit size Time (sec) 
binary onehot 
symbolic fpi (--sym) 317,403 122,514 18.45 
fixpoint with justifications (--fpj) 350,035 139,900 0.16 
fixpoint with freezing (--fpi) 353,120 140,297 0.22 
strategy iteration (--psi) 334,149 140,916 0.57 
priority promotion (--npp) 427,048 161,244 0.17 
Zielonka (--z1k) 480,472 175,427 0.18 
tangle learning (--t1) 604,044 213,632 0.17 


Table 3. Cumulative circuit size in number of gates and cumulative solving time in 
number of seconds for the tested parity game solvers. 


we consider individual input models, we find 20 cases where the half-symbolic 
approach results in slightly smaller parity games than the fully symbolic approach. 
The largest difference is 13 vertices (100 vertices instead of 113 vertices), which 
is negligible compared to the several orders of magnitude advantage that the 
fully symbolic method has in larger parity games, as Table 1 demonstrates. The 
cumulative time for the fully symbolic method is dominated by a handful of input 
models that require more than a second. Almost all parity games are constructed 
in fewer than 10 milliseconds. 

Although the size of the parity game does not necessarily always correspond 
to the size of the constructed circuit or the required time for the entire synthesis 
process, it seems an obvious choice to only consider the fully symbolic translation 
in the remainder of this study. 


4.3 Solving the parity game 


We consider several parity game solvers, which have been shown in the past 
to be successful for solving games derived from synthesis: Zielonka's recursive 
algorithm, priority promotion, tangle learning, the two fixpoint algorithms using 
freezing and justifications, strategy iteration, and symbolic fixpoint iteration. One 
of these, symbolic fixpoint iteration, directly operates on the symbolic parity 
game constructed by the fully symbolic method. All other solvers require the 
procedure outlined in Sec. 3.3 to translate the symbolic representation to an 
explicit game. The game is then solved, and we construct the circuit using the 
standard ITE encoding and either the binary or the onehot encoding of the states. 
We do not yet perform bisimulation minimisation or postprocessing using ABC. 

'The reason that it is interesting to consider different solvers is that different 
solvers may result in entirely different strategies to win the parity game. In 
particular, it may be that some solvers favor winning regions that reach either 
higher priorities or lower priorities, which can result in significant differences. 
'This is in fact supported by the results presented here. 

We report runtimes for solving the parity games (thus excluding time 
before solving and after solving) as well as the sizes of the circuits in Table 3. 
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Fig. 4. Cactus plot of the number of parity games that can be solved within the given 
amount of time per solver. 


Model tl sym pp psi zlk fpi fpj 

generalized buffer unrealií 0.02 0.02 0.14 0.02 0.03 0.02 
generalized buffer 0.01 0.01 0.07 0.01 20.02 0.01 
genbuf2 0.01 0.01 0.03 0.01 0.01 0.01 
full, arbiter  unreal3 0.00 0.00 0.06 0.00 0.02 0.01 
amba, decomposed arbiter. 10 0.02 0.01 0.04 0.01 0.02 0.02 
full_arbiter_8 0.02 0.02 0.08 0.02 0.02 0.02 


Table 4. Overview of individual runtimes of each solver in seconds for the benchmarks 
for which at least one solver requires at least 500 milliseconds. 


We observe that only the symbolic algorithm requires any time at all. The 
other algorithms each require less than a second to solve all benchmarks! When 
we consider the circuit sizes, the fully symbolic algorithm is superior with a 
cumulative 122,514 gates for all circuits. If we are interested in the best solver 
that solves all benchmarks in a fraction of a second, then clearly FPJ is the best 
algorithm, with a cumulative time of 0.16 seconds and a cumulative circuit size 
of 139,900 gates, although the difference with FPI is not that great. 


Remarks. The solving time with the symbolic fixpoint iteration algorithm is 
dominated by just a few benchmarks. All algorithms solve the vast majority of 
parity games in a fraction of a second. See Fig. 4. Notice the logarithmic scale 
and that the vast majority of models are computed within a second for all solvers. 
Just a few models require more than 500 milliseconds to be solved, as is shown 
in Table 4. 

We also did not take parallel operation into account. The symbolic FPI 
solver, the explicit FPI solver, and the strategy iteration solver have parallel 
implementations; the symbolic solver leverages the automatic parallelisation of 
decision diagram operations in Sylvan. 
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Solver Circuit size Time (sec) 
binary onehot 

symbolic fpi (--sym) 4- minimisation 166,839 106,500 0.19 

fixpoint with justifications (--fpj) + min. 205,937 124,489 0.15 

symbolic fpi (--sym) 317,403 122,514 = 

fixpoint with justifications (--fpj) 350,035 139,900 i 


Table 5. Cumulative circuit size in number of gates and cumulative minimisation time 
in number of seconds for the symbolic fpi and the fixpoint with justifications solvers, 
with and without bisimulation minimisation after solving. 


Solver Encoding Circuit size Time 
symbolic fpi (--sym) ISOP, onehot 102,294 0.69 
symbolic fpi (--sym) ITE, onehot 106,500 0.61 
fixpoint with justifications (--fpj) ISOP, onehot 113,134 0.72 
fixpoint with justifications (--fpj) ITE, onehot 124,489 0.64 
symbolic fpi (--sym) ITE, binary 166,839 0.09 
fixpoint with justifications (--fpj) ITE, binary 205,937 0.12 
symbolic fpi (--sym) ISOP, binary 431,316 1.39 
fixpoint with justifications (--fpj) ISOP, binary 476,502 1.61 


Table 6. Cumulative circuit size in number of gates and cumulative encoding time in 
seconds for the symbolic fpi and fixpoint with justification solvers, after bisimulation 
minimisation, using different encodings to obtain the circuit. 


4.4 Bisimulation minimisation 


We study the effects of bisimulation minimisation for the fully symbolic fixpoint 
iteration solver and for the explicit fixpoint iteration with justifications solver 
implemented in Oink. 

As Table 5 shows, running bisimulation minimisation on the resulting strategy 
reduces the total circuit size in all cases. The required time to perform bisimulation 
minimisation is negligible with a cumulative time of a fraction of a second. 

Bisimulation minimisation does not always improve the circuit size. There are 
a few cases where the procedure slightly increases the circuit size. There are also 
several models where the circuit size is reduced by several orders of magnitude. 
Interestingly, in some cases the circuit size is reduced to 0 AND-gates. It seems 
worthwhile to always apply bisimulation minimisation. 


4.5 Encoding strategy to circuit 


We now consider different encodings from the BDD of the strategy to the controller 
circuit. See Table 6. Surprisingly, the combination of ISOP and a binary encoding 
leads to a significantly worse result; whereas using ISOP with a onehot encoding 
slightly reduces the circuit sizes, but not by a significant amount. 
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Solver Encoding Method Circuit size Time 
symbolic fpi (--sym) ISOP compress 61,434 149.26 
symbolic fpi (--sym) ITE compress 62,506 121.27 
fixpoint with justifications (--fpj) ISOP compress 71,240 125.29 
fixpoint with justifications (--fpj) ITE compress 72,897 108.10 
symbolic fpi (--sym) ISOP drewrite 80,077 58.72 
symbolic fpi (--sym) ITE drewrite 80,425 53.21 
fixpoint with justifications (--fpj) ISOP drewrite 80,454 60.88 
fixpoint with justifications (--fpj) ITE drewrite 80,903 58.58 
symbolic fpi (--sym) ISOP 102,294 44.88 
symbolic fpi (--sym) ITE 106,500 39.81 
fixpoint with justifications (--fpj) ISOP 113,134 31.66 
fixpoint with justifications (--fpj) ITE 124,489 25.77 


Table 7. Cumulative circuit size in number of gates for the two solvers, after bisimulation 
minimisation and using onehot encoding, then using different postprocessing methods to 
reduce circuit sizes. Given times are total times from parsing until writing, in seconds. 


Tool Circuit size 

no post-processing with post-processing 
strix 68,550 41,314 
sym-bisim-isop-onehot 87,823 50,624 
Itlsynt 544,804 98,996 


Table 8. Cumulative size of the circuits for the 201 realizable inputs that could be 
constructed by all three tools, before and after post-processing with ABC. 


Looking at individual benchmarks, we find that the most interesting differences 
occur with the full_arbiter_* and amba_decomposed_arbiter_* benchmarks. 
For these benchmarks, ISOP performs much worse than ITE with a binary 
encoding, but shows moderate improvement with the onehot encoding. 

While there are some differences in the encoding times between the different 
approaches, the cumulative encoding time is less than two seconds in all cases. 


4.6 Postprocessing with ABC 


Finally, we apply postprocessing of the constructed circuit using ABC. See Table 7 
for the results. We observe a very clear tradeoff of space and time. The best 
result is obtained by using the compress algorithm, which reduces the number 
of gates by about 40%, but this triples the runtime. 


4.7 Comparison with other tools 


We compare Knor to the tools Strix [26] and Itlsynt [27,33,34]. We obtain the two 
competing tools from the SYNTCOMP 2023 artifact [21]. We use the following 
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command lines, similar to those used in the SYNTCOMP 2023 competition, to 
run the tools: 


— Run Strix without post-processing in ABC: 

strix --auto --no-compress-circuit -t --hoa «filename» 
— Run Strix with post-processing in ABC: 

strix --auto -t --hoa «filename» 
— Run ltlsynt (without post-processing in ABC): 

ltlsynt --from-pgame=<filename> --aiger --verbose 


In the competition, Itlsynt had optional post-processing in ABC as part of 
the script rather than the executable. T'his script executed the following ABC 
commands: collapse;strash;refactor;rewrite. The Strix executable runs 
an embedded version of ABC, repeating the compress2rs script until no more 
improvement is found. To improve the fairness of the comparison, we change the 
post-processing for ltlsynt to start with collapse;strash, as this re-encoding of 
the circuit via binary decision diagrams significantly improves upon the circuit 
encoding by ltlsynt, followed by repeating the compress2rs script until there 
is no more improvement. This gives better results than obtained by ltlsynt in 
SYNTCOMP 2023. 

Only 208 of the 288 input files are realizable. Of these, Strix did not solve the 
following inputs within the 3600 seconds time limit: amba, decomposed. lock. 14, 
amba decomposed lock 15, Automata325, Gamelogic, genbuf2, SPIPureNext, 
generalized buffer. Except for amba decomposed lock. 15, ltlsynt solved all 
inputs. Disregarding inputs that could not be solved by Strix or ltlsynt, we 
have 201 realizable inputs that can be solved within the time limit by all three 
tools. We provide the results with and without post-processing using ABC in 
Table 8. Considering individual results, we observe that Strix yielded smaller 
circuits in 142 cases (147 with post-processing) and Knor yielded smaller cir- 
cuits in 47 cases (also 47 with post-processing). For the larger circuits, the 
amba, decomposed arbiter, * inputs favored Knor (1527 vs 8282 gates, after 
post-processing), while Strix did better on the full arbiter. inputs (1594 vs 
26040 gates, after post-processing). 

Table 8 clearly shows that all tools benefit from the post-processing. While 
Strix gives the best results for circuit size, the cumulative circuit size of Knor is 
only 2396 more. Knor solves the entire set of inputs, including post-processing by 
ABC, in about 2.5 minutes, while Strix and ltlsynt cannot solve some benchmarks 
within the time limit of 1 hour, before post-processing. 


5 Discussion 


In this work, we studied techniques to improve reactive synthesis of parity 
automata to Boolean circuits using a new tool named Knor. We proposed 
a number of techniques and empirically evaluated these techniques using the 
benchmarks of the SYNTCOMP competition derived from LTL specifications. 
Knor has won the PGAME track of the competition several times. 
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The evidence presented in the empirical evaluation suggests that the best 
approach for deciding realizability is to use the fully symbolic translation from 
parity automaton to parity game, and any fast explicit-state parity game solver 
(like a tangle learning variation) for which no hard games have yet been designed. 
The latter is only needed to counteract any efforts aimed at impairing Knor’s 
performance in SYNTCOMP through the introduction of artificially difficult 
benchmarks. 

For synthesis, considering a low circuit size as our primary objective, the clear 
solution is to use either symbolic fpi (- -sym) or fixpoint with justifications (--fpj), 
preferring the former at the cost of speed in a few benchmarks, always apply 
bisimulation minimisation (--bisim), use a onehot encoding (--onehot) with 
either ITE or ISOP encoding, and apply postprocessing using ABC's compress2rs 
script (--compress). 

Knor is publicly available via https:/ /www.github.com/trolando/knor. 


Future work There are many opportunities for future improvements to the 
entire pipeline. We already mentioned playing with the variable ordering within 
the variable groups of the symbolic parity game, and considering slightly more 
efficient translations from the symbolic parity game to an explicit game in Oink. 

We could also consider designing a parity game solving algorithm that explic- 
itly results in small strategies. Some solvers might yield a multi-strategy, where 
multiple edges in the parity game can be taken to win the game. This could 
potentially be exploited to simplify the circuits. 

It may also be useful to consider bisimulation minimisation on the parity 
game before solving, and to change the encoding of the states into the BDD, as 
we currently use a naive binary encoding of the state identifiers in the eHOA 
format. There may also be other encoding strategies to obtain the Boolean circuit, 
such as a different encoding of the latches or the approach of [30]. 

Beyond the reactive synthesis of parity automaton specifications, we may 
also explore symbolic techniques, including those outlined in this paper, for the 
synthesis of LTL specifications, building on the preliminary results from our 
earlier prototype described in [13]. 
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The images or other third party material in this chapter are included in the 
chapter's Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter's Creative Commons license and 
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use, you will need to obtain permission directly from the copyright holder. 
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Abstract. Given a Linear Temporal Logic (LTL) formula over input 
and output variables, reactive synthesis requires us to design a deter- 
ministic Mealy machine that gives the values of outputs at every time 
step for every sequence of inputs, such that the LTL formula is satisfied. 
In this paper, we investigate the notion of dependent variables in the 
context of reactive synthesis. Inspired by successful pre-processing tech- 
niques in Boolean functional synthesis, we define dependent variables in 
reactive synthesis as output variables that are uniquely assigned, given 
an assignment to all other variables and the history so far. We describe 
an automata-based approach for finding a set of dependent variables. Us- 
ing this, we show that dependent variables are surprisingly common in 
reactive synthesis benchmarks. Next, we develop a novel synthesis frame- 
work that exploits dependent variables to construct an overall synthesis 
solution. By implementing this framework using the widely used library 
Spot, we show that reactive synthesis that exploits dependent variables 
can solve some problems beyond the reach of existing techniques. Fur- 
thermore, we observe that among benchmarks with dependent variables, 
if the count of non-dependent variables is low (< 3 in our experiments), 
our method outperforms state-of-the-art tools for synthesis. 


Keywords: Reactive synthesis - Functionally dependent variables: BDDs 


1 Introduction 


Reactive synthesis concerns the design of deterministic transducers (often Mealy 
or Moore machines) that generate a sequence of outputs in response to a sequence 
of inputs such that a given temporal logic specification is satisfied. Church intro- 
duced the problem [12] in 1962, and there has been a rich and storied history of 
work in this area over the past six decades. Recently, it was shown that a form of 
pre-processing, viz. decomposing a Linear Temporal Logic (LTL) specification, 
can lead to significant performance gains in downstream synthesis steps [15]. The 
general idea of pre-processing a specification to simplify synthesis has also been 
used very effectively in the context of Boolean functional synthesis [4,5,17,18,25]. 
Motivated by the success of one such pre-processing step, viz. identification of 
uniquely defined outputs, in Boolean functional synthesis, we introduce the no- 
tion of dependent outputs in the context of reactive synthesis in this paper. We 
develop its theory and show by means of extensive experiments that dependent 
outputs are common in reactive synthesis benchmarks, and can be effectively 
(€ The Author(s) 2024 
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exploited to obtain synthesis techniques with orthogonal strengths vis-a-vis ex- 
isting state-of-the-art techniques. 


In the context of propositional specifications, it is not uncommon for a spec- 
ification to uniquely define an output variable in terms of the input variables 
and other output variables. A common example of this arises when auxiliary 
variables, called Tseitin variables, are introduced to efficiently convert a specifi- 
cation not in conjunctive normal form (CNF) to one that is in CNF [28]. Being 
able to identify such uniquely defined variables efficiently can be very helpful, 
whether it be for checking satisfiability, for model counting or synthesis. This 
is because these variables do not alter the basic structure or cardinality of the 
solution space of a specification regardless of whether they are projected out 
or not. Hence, one can often simplify the reasoning about the specification by 
ignoring (or projecting out) these variables. In fact, the remarkable practical suc- 
cess of Boolean functional synthesis tools such as Manthan [18] and BFSS [4,5] 
can be partly attributed to efficient techniques for identifying a large number of 
uniquely defined variables. We draw inspiration from these works and embark 
on an investigation into the role of uniquely defined variables, or dependent vari- 
ables, in the context of reactive synthesis. To the best of our knowledge, this is 
the first attempt at directly using dependent variables for reactive synthesis. 


We start by first defining the notion of dependent variables in LTL specifi- 
cations for reactive synthesis. Specifically, given an LTL formula y over a set of 
input variables J and output variables O, a set of variables X C O is said to be 
dependent on a set of variables Y C I U (O\X) in y, if at every step of every 
infinite sequence of inputs and outputs satisfying y, the finite history of the se- 
quence together with the current assignment for Y uniquely defines the current 
assignment for X. The above notion of dependency generalizes the notion of 
uniquely defined variables in Boolean functional synthesis, where the value of a 
uniquely defined output at any time is completely determined by the values of 
inputs and (possibly other) outputs at that time. We show that our generaliza- 
tion of dependency in the context of reactive synthesis is useful enough to yield 
a synthesis procedure with improved performance vis-a-vis competition-winning 
tools, for a non-trivial number of reactive synthesis benchmarks. 


We present a novel automata-based technique for identifying a subset-maximal 
set of dependent variables in an LTL specification y. Specifically, we convert y 
to a language-equivalent non-deterministic Büchi automaton (NBA) A,, and 
then deploy practically efficient techniques to identify a subset-maximal set of 
outputs X that are dependent on Y = IU (O^ X). We implemented our method 
to determine the prevalence of dependent variables in existing reactive synthesis 
benchmarks. Our finding shows that out of 1141 benchmarks taken from the 
SYNTCOMP [21] competition, 300 had at least one dependent output variable 
and 26 had all output variables dependent. 


Once a subset-maximal set, say X, of dependent variables is identified, we 
proceed with the synthesis process as follows. Referring to the NBA A, alluded 
to above, we first transform it to an NBA AV‘, that accepts the language L’ 
obtained from L(y) after removing (or projecting out) the X variables. Our 
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experiments show that At, is more compactly representable compared to A,, 
when using BDD-based representations of transitions (as is done in state-of-the- 
art tools like Spot [7]). Viewing Aj, as a new (automata-based) specification 
with output variables O V X, we now synthesize a transducer Ty from A’ using 
standard reactive synthesis techniques. This gives us a strategy fY : X; — SIO\X 
for each non-dependent variable in OX X. Next, we use a novel technique based 
on Boolean functional synthesis to directly construct a circuit that implements a 
transducer Tx that gives a strategy fx : Jy — Xx for the dependent variables. 
Significantly, this circuit can be constructed in time polynomial in the size of 
the (BDD-based) representation of A,. The transducers Ty and Tx are finally 
merged to yield an overall transducer T that describes a strategy f : M; — Xo 
solving the synthesis problem for q. 

We implemented our approach in a tool called DepSynt. Our tool is devel- 
oped in C4-4- using APIs from the widely used library Spot for representing and 
manipulating non-deterministic Büchi automata. We performed a comparative 
analysis of our tool with winning entries of the SYNTCOMP [21] competition to 
evaluate how knowledge of dependent variables helps reactive synthesis. Our ex- 
perimental results show that identifying and utilizing dependent variables results 
in improved synthesis performance when the count of non-dependent variables 
is low. Specifically, our tool outperforms state-of-the-art and highly optimized 
synthesis tools on benchmarks that have at least one dependent variable and 
at most 3 non-dependent variables. This leads us to hypothesize that exploiting 
dependent variables benefits synthesis when the count of non-dependent vari- 
ables is below a threshold. Given the preliminary and un-optimized nature of 
our implementation, we believe there is significant scope for improvement. 


Related work. Reactive synthesis has been an extremely active research area for 
the last several decades (see e.g. [9, 12, 15, 16, 24]). Not only is the theoretical 
investigation of the problem rich, there are also several tools that are available 
to solve synthesis problems in practice. These include solutions like 1t1synt [23] 
based on Spot [7], Strix [22] and BoSY [14]. Our tool relies heavily on Spot and its 
APIs, which we use liberally to manipulate non-deterministic Büchi automata. 
Our synthesis approach is based on the standard conversion of LTL formula to 
NBA, and then from NBA to deterministic parity automata (DPA) (see [8] for 
an overview of the challenges of reactive synthesis). 

Our work may be viewed as lifting the idea of uniquely defined variables used 
in Boolean functional synthesis to the context of reactive synthesis. Viewed from 
this perspective, our work is not the first to lift ideas from Boolean functional 
synthesis to the reactive context. Following an approach for Boolean functional 
synthesis that decomposes a specification into separate formulas on input vari- 
ables and on output variables [11], the work in [6] constructed a reactive synthe- 
sis tool for specific benchmarks that admit a separation of the specification into 
formulas for only environment variables and formulas for only system variables. 
'The current work serves as an additional example in support of the hypothesis 
that intuition from Boolean functional synthesis can be helpful and effective in 
the reactive synthesis context. 
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'The remainder of the paper is structured as follows. We introduce definitions 
and notations in Section 2. In Section 3 we define dependent variables for LTL 
formulas, and describe an algorithm to find them. In Section 4 we describe our 
automata-based synthesis framework and discuss its implementation details in 
Section 5. We describe our evaluation in Section 6 and conclude in Section 7. 
Missing proofs and additional experiments can be found in the full-version [2]. 


2 Preliminaries 


Given a finite alphabet X, an infinite word w is a sequence wgww»--- where for 
every i, the i^^ letter of w, denoted wj, is in X. The prefix wo --- wj (of size i+1) 
of w is denoted by w[0,i]. Note that w[0,0] = wo. We use w[0, —1] to denote 
the empty word. The set of all infinite words over X is denoted by ©”. We call 
L C X" a language over infinite words in w. For our work, the alphabet X is 
often the product of two distinct alphabets Ex and Xy, i.e. Y = Xx x Xy. In 
such cases, for every a = (a1,a2) € X, we abuse notation and use a.X to denote 
the projection of a on Xx, i.e. the letter aj € Xx. Similarly, a.Y denotes the 
projection of a on Xy, i.e. the letter ag € Xy. For an infinite word w € X", we 
use w.X to denote the infinite word in XX obtained by projecting each letter in 
won Xx ie. w.X = wo.Xw4.X .... 


Linear Temporal Logic. A Linear Temporal Logic (LTL) formula is constructed 
with a finite set of propositional variables V, using Boolean operators such as 
V, ^, and ~, and temporal operators such as next (X), until (U), etc. The set 
V induces an alphabet Xy = 2V of all possible assignments (true/false) to 
the variables of V. The semantics of the operators and satisfiability relation are 
defined as usual [20]. The language of an LTL formula y, denoted L(y) is the 
set of all words in X that satisfy y. For an LTL formula y over V, we use 
|V| to denote the number of variables in V, and |p| to denote the size of the 
formula, i.e., count of its subformulas. For clarity of exposition, we sometimes 
abuse notation and identify the singleton variable set {z} with z. We also use X 
for Xy, when V is clear from the context. 


Nondeterministic Büchi Automata. A Nondeterministic Büchi Automaton (NBA) 
is a tuple A = (X, Q, ô, qo, F) where X is the alphabet, Q is a finite set of states, 
ô: Qx X — 29 is a non-deterministic transition function, go is the initial 
state and F C Q is a set of accepting states. Automaton A can be seen as a 
directed labeled graph with vertices Q and an edge (q,q') exists with a label a 
if q' € ó(q, a). We denote the set of incoming edges to q by in(q) and the set of 
outgoing edges from q by out(q). A path in A is a (possibly infinite) sequence of 
states p = (qio; qi,,::) in which for every j > 0, (q;;, q;,,,) is an edge in A. A 
run is a path that starts in qo, and is accepting if it visits a state in F infinitely 
often. A word W = Gioi, -++ induces a run p = (qio, qi ;:::) of A if qi; = qo and 
for every j > 0, qi;,, € 0(qi;,0i;). Since A is nondeterministic, a word can have 
many runs. A word is accepting if it has an accepting run in A. The language 
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L(A) is the set of all accepting words in A. Wlog, we assume that all states and 
edges that are not a part of any accepting run (i.e. do not reach a cycle with an 
accepting state) are removed. This can be done by a simple pre-processing pass 
on the NBA. Finally, every LTL formula y can be transformed in time exponen- 
tial time in the size of y to an NBA A, for which L(y) = L(A,) [20,29]. When o 
is clear from the context we omit the subscript and refer to A, as A. We denote 
by |A| the size of an automaton, i.e., number of its states and transitions. 


Reactive Synthesis. A reactive LTL formula is an LTL formula q over a set of 
input variables J and output variables O, with INO = (). In reactive synthesis we 
are given a reactive LTL formula x, and the challenge is to synthesize a function, 
called strategy, f : XF — Xo such that every word w € (Xr x Xo)” obtained by 
using this strategy at every time step is in L(y). If such a strategy exists we say 
that y is realizable. Otherwise, we say that y is unrealizable. In what follows, we 
always consider only reactive LTL formulas and hence omit the "reactive" prefix 
while referring to them. The synthesized strategy f : X; — Xo is typically 
described (explicitly or symbolically) as a transducer T = (Xr, Xo, S, So, ô, A) 
in which X; and Xo are input and output alphabet respectively, S is a set of 
states with an initial state sg, ô : S x Xr — S is a deterministic transition 
function, and A : $ x Xy + Xo is the output function. A standard procedure in 
solving reactive synthesis is to transform a given LTL formula to an NBA A; 
for which L(A,) = L(y). Subsequently, Ay is transformed to a Deterministic 
Parity Automata (DPA) that turns to a parity game, whose solution is described 
as a transducer T4,. As the following theorem shows, this approach incurs a 
double exponential blowup in the worst-case. 


Theorem 1. 1. Reactive synthesis can be solved in O(2"?"), where n is the 
size of the LTL formula. 
2. Given an NBA A with n states, computing transducer T4 takes (2(2^lo&n), 


3 Dependent variables in reactive LTL 


We begin by defining dependent variables for (reactive) LTL formulas and pro- 
pose an algorithm for finding a maximal set of dependent variables. While there 
are several notions of dependency that can be considered, we discuss one that 
we have found to be useful in reactive synthesis. Specifically, we require that the 
value of a dependent output variable be completely determined by the values of 
inputs and other output variables and their finite history at every step of the 
interaction between the reactive system and its environment. We consider de- 
pendencies restricted to output variables, since having dependent input variables 
would preclude some input sequences, rendering the specification unrealizable. 


Definition 1 (Variable Dependency in LTL). Let y be an LTL formula 
over V with input variables I C V and output variables O — VM. Let X,Y be 
disjoint sets of variables where X C O. We say that X is dependent on Y in p 
if for every pair of words w,w' € L(y) and i > 0 if w[0, i — 1] = w'[0, i — 1] and 
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w;.Y = w;.Y, then we have w;.X = w;.X. Further, we say that X is dependent 
in p if X is dependent on V \ X in ọ, i.e., it is dependent on all the remaining 
variables. 


Note that two words in L(y) with different prefixes can have different values 
for X for the same values for Y, if X is dependent on Y. Also, observe that if 
X is dependent on Y in y for some Y, then it is also dependent in g. 

As an example, consider an LTL formula y with input variable y and output 
variable z. The corresponding input and output alphabets are Xx = {x, ^x) and 
Xy = [y,-y) respectively. Suppose L(y) = {w', w?, w?} where w! = (y, x)", 
w? = (ny, 2)” and w? = (y, x)(^y, x)(y, =x)”. Then x is dependent on y in y. 
Specifically, note that w![0, 1] 4 w?[0, 1], and hence the dependency of x is not 
violated although w3.y = w3.y and wl.x A w3.x. 


3.1 Maximally dependent sets of variables Given an LTL formula y(J, O), 
we say that a set X C O is a maximal dependent set in q if X is dependent 
in o and every set of outputs that strictly contains X is not dependent in g. 
As in the propositional case [27], finding maximum or minimum dependent sets 
is intractable, hence we focus on subset-maximality. Given a variable z and 
set Y, checking whether z is dependent on Y, can easily be used to finding a 
maximal dependent set. Indeed, we would just need to start from the empty 
set and iterate over output variables, checking for each if it is dependent on 
the remaining variables. We give the pseudocode for this in [2]. Note that when 
all output variables are not dependent, the order in which output variables are 
chosen may play a significant role in the size of the maximal set obtained. We 
currently use a naive ordering (first appearance), and leave the problem of better 
heuristics for getting larger maximal independent sets to future work. 


3.2 Finding dependent variables via automata As explained above, the 
heart of the dependency check is to verify whether a given output variable is 
dependent on a set of other variables. We now develop an approach for doing 
so based on the nondeterministic Büchi automaton A, that represents the same 
language as the LTL formula y. Our framework uses the notion of compatible 
pairs of states of the automaton: 


Definition 2. Let A = (X, Q, ô, qo, F) be an NBA with states s,s’ in Q. Then 
the pair (s, s') is compatible in A if there are runs from qo to s and from qo to 
s' on the same word w € X*. 


Recall that in our definition, only states and edges that are part of an accepting 
run exist in A. Then we have the following definition. 


Definition 3. Let y be an LTL formula over V with input variables I C V and 
output variables O — VM. Let X,Y be disjoint sets of variables where X C O. 
Let Ay be an NBA that describes p. We say that X is automata dependent on 
Y in Ag, if for every pair of compatible states s, s' and assignments o, o' for V, 
where 0.Y = o'.Y and o.X £z: o'.X, ó(s,c0) and 5(s,0') cannot both exist in Ay. 
We say that X is automata dependent in A, if X is automata dependent on Y 
in Ay and Y =V\X. 


On Dependent Variables in Reactive Synthesis 129 


As an example, consider NBA A, in Figure 1, constructed from some LTL 
formula with input J = {i} and outputs O = {01,02}. For notational simplicity, 
we use Xy = {0,1}, Zo = (0, 1)?, and edges are labeled by values of (i, 0103). It is 
easy to see that (qo, qo), (q1,q1) are compatible pairs, but so are (qo, q1), (q1, Go) 
since both qo and qı be reached from the initial state on reading the word 
(0,00)(0, 00) of length 2. Now consider output 0). It is not dependent on {i}, 
i.e., only the input, since from qo with i = 0, we can go to different states with 
different values of o1. But o1 is indeed dependent on 14, 02}. To see this consider 
every pair of compatible states — in this case all pairs. Then if we fix the values 
of i and o», there is a unique value of o1 that permits state transitions to happen 
from the compatible pair. For example, regardless of which state we are in, if 
i — 0,09 — 0, o1 must be 0 for a state transition to happen. On the other hand, 
02 is not dependent on either (i) or (4,01) (as can be seen from (qo, q1) with 
i = 1,01 = 1). The following theorem relates automata-based dependency and 
dependency in LTL (for proof, see [2]), allowing us to focus only on the former. 


Theorem 2. Let y be an LTL formula with set of variables V = I U O, where 
X CO and Y CIU(O\X). Let Ay be an NBA with L(y) = L(A). Then X 
is dependent on Y in y if and only if X is automata dependent on Y in Ag. 


Finding Compatible States. We find all compatible 

states in an automaton in Algorithm 1 as follows. 111 
We maintain a list of in-process compatible pairs 
C that is initialized with (qo, qo) - an undoubtedly 0,00 
compatible pair. At each step, until C becomes —, 
empty, we pick a pair (s;,5;) € C, add it to the 
compatible pair set P, and remove it from C (in 

line 4). Then (in lines 5-8), we check (in line 6) 1,10 0,00 

if outgoing transitions from (s;, sj) lead to a new 

pair (sj, 57) not already in P or C, that can be Fig. 1. An Example NBA Ai 
reached on reading the same letter c. If so, we add this pair to the in-process set 
C. All pairs that we add to P, C are indeed compatible, and nothing is removed 
from P. When the algorithm terminates, C is empty, which means all possible 
ways (from initial state pair) to reach a compatible pair have been explored, 
thus showing correctness. 

Finally, we show how to check dependency using automata, by implement- 
ing procedure isAutomataDependent, shown in Algorithm 2. This procedure 
takes an NBA A,, a candidate dependent output z and a candidate dependency 
set Y C V X {z} as inputs, and tries to find a witness to z not being dependent 
on Y. If no such witness exists, then z is declared as being dependent on Y. 
Procedure isAutomataDependent first uses Algorithm 1 to construct a list P 
of all compatible pairs in A (line 4). Then for every pair (s,s’) € P, the algo- 
rithm checks using procedure AreStatesColliding (lines 1-2) whether there exists 
an assignment o,o’ for which both ó(s,c) and ó(s', o) exist, 0. Y = c'.Y and 
a{z}#o'{z}. If so, z is not dependent on Y (line 7) and the algorithm returns 
false. Otherwise, afterchecking all the pairs, the algorithm returns true. 
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Algorithm 1 Find All Compatible States in NBA 
Input NBA A, = (2,Q,6,qo, F) of ọ. 
Output Set P C Q x Q of all compatible state pairs in A, 


1: P «- 0; C < {(q0,40)} 

2: while C 40 do 

3: Let (si, sj) € C 

4: P + PU {(si,8;)}5 C + CV (si, sj) 
5: for (5;,55) € out(s1) x out(s2) do 

6: if (si,s,) € PUC and Jo € 2” s.t. s; € 6(si,0) ^ s; € 6(s;,0) then 
T: C — CU ((si,s5)) 

8: end if 

9: end for 

10: end while 

11: return P 


Algorithm 2 Check Dependency Based Automaton 

Input NBA A, = (2,Q,0,qo, F) from y, Candidate dependent variable z, 
Candidate dependency set Y. 
Output Is z dependent on Y by Definition 3 
procedure ARESTATECOLLIDING(p, q) 

return Jop, oq € 2” s.t. lp, op)  Ü^ó(q,oq)  Ü^ay.Y = aq.Y Nop {ay £ 
Oq-{z} 
end procedure 
P + FindAllCompatibleStates(Ay) 
for (31,52) € P do 

if AreStateColliding(s1, s2) then 

return False 

end if 
end for 
: return True 


po 


mn 


Lemma 1. Algorithm 2 returns 'True if and only if z is automata-dependent on 
Y in Ag. 


Using the above algorithm to perform dependency check, it is easy to compute 
a maximal set of dependent variables (as explained earlier). Note that all the 
above algorithms run in time polynomial (in fact, quadratic) in size of the NBA. 


Corollary 1. Given NBA Ay, a maximal dependent set of outputs can be com- 
puted in time polynomial in the size of Ag. 


Note that if all output variables are dependent, then regardless of the order in 
which the outputs are considered, for every finite history of inputs, there is a 
unique value for each output that makes the specification true. Therefore, there 
is a unique winning strategy for the specification, assuming it is realizable. 
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4. Syn-Nondep Ty 
2. Identify Dep A 


6. Syn-Comb T 


? >| 1. LTL to NBA Ag 


Fig. 2. Synthesis using dependencies. Note that Steps 2., 3., 5, are novel, while Steps 
1., 4., 6. (shaded in gray) use pre-existing techniques. 


5. Syn-Dep Tx 


4 Exploiting Dependency in Reactive Synthesis 


In this section, we explain how dependencies can be beneficially exploited in a 
reactive synthesis pipeline. Our approach can be described at a high level as 
shown in Figure 2. This flow-chart has the following 6 steps: 


1. Given an LTL formula q over a set of variables V with input variables I C V 
and output variables O — VM, we first construct a language-equivalent NBA 
Ay = (Xr U Xo, S, so, ô, F) by standard means, e.g [29]. 

2. Then, as described in Section 3, we find in Ay a maximal set of output 
variables X that are dependent in y. For notational convenience, in the 
remainder of the discussion, we use Y for [U(O\X) and Xy for Xr x Xox- 

3. Next, we construct an NBA AY‘, from A, by projecting out (or eliminating) 
all X variables from labels of transitions. Thus, AG has the same sets of 
states and transitions as A,. We simply remove valuations of variables in X 
from the label of every state transition in Aj to obtain AZ. Note that after 
this step, L(A7) = {w | Ju € L(Aj) s.t. w =u.Y} C Xy. 

4. Treating Aj, as a (automata-based) specification with inputs J and outputs 
ON X, we next use existing reactive synthesis techniques (e.g., [8]) to obtain 
a transducer Ty that describes a strategy fy : X7 > Yo\x for L(A,)). 

5. We also construct a transducer Tx that describes a function fx : (X$ > 
Xx) with the following property: for every word w' € L(A‘) there exists a 
unique word w € L(y) such that w.Y = w’ and for all i, w;.X = fx (w'[0, i]). 

6. Finally, we compose 7x and Ty to construct a transducer T that defines the 
final strategy f : XY} — Xo. Recall that transducer Ty has J as inputs and 
O \ X as outputs, while transducer Tx has J and OX X as inputs and X as 
outputs. Composing Tx and Ty is done by simply connecting the outputs 
O \ X of Ty to the corresponding inputs of Tx. 


In the above flow, we use standard techniques from the literature for Steps 1 
and 4, as explained above. Hence we do not dwell on these steps in detail. Step 
2 was detailed in Section 3. Step 3 is easy when we have an explicit representa- 
tion of the automata, but it has interesting consequences when using symbolic 
representations of automata. Step 6 is also easy to implement. Hence, in the 
remainder of this section, we focus on Step 5, a key contribution of this paper. 
In the next section, we will discuss how steps 2, 3 and 5 are implemented using 
symbolic representations (viz. ROBDDs). 
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Constructing transducer Tx Let A = (X; x Xo, Q, ô, qo, F) be the NBA A, 
obtained in step 1 of the pipeline shown above. Since each letter in Xo can be 
thought of as a pair (o, 0'), where o € Xo x and o’ € Xx, the transition function 
ô can be viewed as a map from Q x (Xy x Xoyx x Xx) to 29. The transducer 
Tx we wish to construct is a deterministic Mealy machine described by the 6- 
tuple (Xy, Xx U {1}, Q*, qd ,6*, A), where Xy = Xr x X(oxx) is the input 
alphabet, Xx is the output alphabet with L ¢ Xx being a special symbol that is 
output when no symbol of Xx suffices, Q* — 29, that is the powerset of Q is the 
set of states of Tx, qo = {qo} is the initial state, 0* : Q* x X; x Xox) > Q* 
is the state transition function, and \* : Q* x X; x (0\xX) > Xx is the output 
function. The state transition function 6* is defined by the Rabin-Scott subset 
construction applied to the automaton A, [19]. Formally, for every U C Q, 
c; € Xr and o € Xox), we define ó* (U, (o1,0)) = {q | d € Q, dq € U and 
do’ € Xy s.t. q' € ó(q, (01,0, o'))}. Before defining the output function A*, we 
state an important property of T* that follows from the definition of 5* above. 


Lemma 2. If X is automata dependent in A,, then every state U reachable 
from qx in Tx satisfies the property: Vq,q' € U, (q,q') is compatible in Ay. 


The lemma is easily proved by induction on the number of steps needed to 
reach U from qj. Details of the proof may be found in [2]. We are now ready 
to define the output function \* of Tx. Let U be a state reachable from qj 
in Tx and let U’ = 6*(U,(o7,0)), where (o7,0) € Xy. If U’ # 0, we can 
infer that (see Proof of Lemma 2 in [2]) that there is a unique ox € Xx s.t. 
U' = (q' | Iq € U st. d' € 5(q, (or, 0,0x))}. We define A* (U, (o7,0)) = ox in 
this case. If, on the other hand, U’ = 0, we define A* (U, (or, 0o)) = L. 


Theorem 3. /f is realizable, the transducer T obtained by composing Tx and 
Ty as in step 6 of Fig. 2 solves the synthesis problem for q. 


An interesting corollary of the above result is that for realizable specifications 
with all output variables dependent, we can solve the synthesis problem in time 
O(2*) instead of 2(2*!°8*), where k = |A,|. This is because the subset construc- 
tion on A, suffices to obtain Ty, while A, must be converted to a deterministic 
parity automaton to solve the synthesis problem in general. 


5 Symbolic Implementation 


In this section, we describe symbolic implementations of each of the non-shaded 
steps in the synthesis flow depicted in Fig. 2. Before we delve into the details, a 
note on the representation of NBAs is relevant. We use the same representation 
as used in Spot [7] — a state-of-the-art platform for representing and manipulating 
LTL formulas and w-automata. Specifically, the transition structure of an NBA 
A is represented as a directed graph, with nodes representing states of A, and 
directed edges representing state transitions. Furthermore, every edge from state 
s to state s' is labeled by a Boolean function B(,,; over I U O. The Boolean 
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function can itself be represented in several forms. We assume it is represented 
as a Reduced Ordered Binary Decision Diagram (ROBDD) [10], as is done in 
Spot. Each such labeled edge represents a set of state transitions from s to s’, 
with one transition for each satisfying assignment of D, s’). 


Implementing Algorithms 1 and 2 (Step 2) : Since states of the NBA A, 
are explicitly represented as nodes of a graph, it is straightforward to imple- 
ment Algorithms 1 and 2. The check in line 6 of Algorithm 1 is implemented by 
checking the satisfiability of B(s, (1,0) ^ Bis; s U, O) using ROBDD oper- 
ations. Similarly, the check in line 2 of Algorithm 2 is implemented by checking 
the satisfiability of Vt, 6)cout(p) xout(q) Bs) O) ^ Begs) (0,0) ^ Ayey (y @ 
y) ^ (z & ^z) using ROBDD operations. In the above formula, I’ (resp. O^) 
denotes a set of fresh, primed copies of variables in I (resp. O). 


Implementing transformation of A, to Aj, (Step 3): To obtain Aj, we 
simply replace the ROBDD for By.,s/) on every edge (s, s') of the NBA Ag by an 
ROBDD for 3X B(,,. While the worst-case complexity of computing 1X B(, 
using ROBDDs is exponential in |X|, this doesn't lead to inefficiencies in practice 
because |X| is typically small. Indeed, our experiments reveal that the total 
size of ROBDDs in the representation of Aj, is invariably smaller, sometimes 
significantly, compared to the total size of ROBDDs in the representation of 
Ay. Indeed, this reduction can be significant in some cases, as the following 
proposition shows (see proof in [2]). 


Proposition 1. There exists an NBA A, with a single dependent output such 
that the ROBDD labeling its edge is exponentially (in number of inputs and 
outputs) larger than that labeling the edge of Aj. 


Implementing transducer Tx (Step 5): We now describe how to construct a 
Mealy machine corresponding to the transducer Tx. As explained in the previous 
section, the transition structure of the Mealy machine is obtained by applying 
the subset construction to Ap. While this requires O(2/4«!) time if states and 
transitions are explicitly represented, we show below that a sequential circuit 
implementing the Mealy machine can be constructed directly from A, in time 
polynomial in |X| and |A,|. This reduction in construction complexity crucially 
relies on the fact that all variables in X are dependent on IU (O \ X). 

Let S = (59,... 5x. 1) be the set of states of Ay, and let in(s;) denote the set 
of states that have an outgoing transition to s; in A,. To implement the desired 
Mealy machine, we construct a sequential circuit with k state-holding flip-flops. 
Every state U (C S) of the Mealy machine is represented by the state of these k 
flip-flops, i.e. by a k-dimensional Boolean vector. Specifically, the it component 
is set to 1 iff s; € U. For example, if S = (59,51, S2} and U = (so, s2}, then 
U is represented by the vector (1,0,1). Let n; and p; denote the next-state 
input and present-state output of the i^^ flip-flop. The next-state function ó* 
from pis to ns of the Mealy machine is implemented by a circuit, say A* , with 
inputs (po, ... py 1] U IU (OX X) and outputs ([no,... ny 1). For i € (0,... k— 
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1}, output n; of this circuit implements the Boolean function Vs Eus) (pj ^ 


dX Bis; s). To see why this works, suppose (po, . . . Ppk—1) represents the current 
state U C S of the Mealy machine. Then the above function sets n; to true iff 
there is a state s; € U (i.e. p; = 1) s.t. there is a transition from s; to s; on some 
values of outputs X and for the given values of JU(OV X) (i.e. IX B(,,,,, = 1). 
This is exactly the condition for s; to be present in the state U' C S reached 
from U for the given values of I U (O XV X) in the Mealy machine obtained by 
subset construction. 

It is known from the knowledge compilation literature (see e.g. [1,4, 13]) that 
every ROBDD can be compiled in linear time to a Boolean circuit in Decom- 
posable Negation Normal Form (DNNF), and that every DNNF circuit admits 
linear time projection of variables, yielding a resultant DNNF circuit. Hence, a 
Boolean circuit for 3X Bi,,,,,) can be constructed in time linear in the size of 
the ROBDD representation of B(,, ,;. This allows us to construct the circuit 


A*, implementing the next-state transition logic of our Mealy machine, in time 
(and space) linear in |X| and |A,|. 

Next, we turn to constructing a circuit A* that implements the output func- 
tion \* of our Mealy machine. It is clear that A* must have inputs {po,... py—1}U 
IU (ON X) and outputs X. Since X is automata dependent on TU (ON X) in 
Ay, the following proposition is easily seen to hold. 

Proposition 2. Let B(s s) be a Boolean function with support I UO that labels 
a transition (s, s") in Ay. For every (01,0) € Xr x Lo\x, if (01,0) FAX Biss’), 
then there is a unique o' € Xx such that (01, 0,0") F B(s,,;. 


LLI 


Considering only the transition (s, s^) referred to in Proposition 2, we first discuss 


how to synthesize a vector of Boolean functions, say F(*5) = (poo, pu D 


where each component function has support TU (O \ X), such that FO!) [I => 
21][ON X > e] =o’. Generalizing beyond the specific assignment of 7 U O, our 
task effectively reduces to synthesizing an |X|-dimensional vector of Boolean 
functions F9? st. VI U (O \ X) (AX Biss) — Biss) [X => F65?]) holds. 
Interestingly, this is an instance of Boolean functional synthesis — a problem 
that has been extensively studied in the recent past (see e.g. [1,3, 4,6, 11]). In 
fact, we know from [1,26] that if B(s,s’) is represented as an ROBDD, then a 
Boolean circuit for Fiss) can be constructed in O(|X|?.|B(,.,;)|) time, where 
|B(s,s’)| denotes the size of the ROBDD for B(s s). For every x; € X, we use this 


technique to construct a Boolean circuit for F de ) for every edge (s, s) in A. The 
overall circuit A* is constructed such that the output for x; € X implements 


the function Vi,ansition (sr) in A (ps ^ (Biss) [X > ps] A FE), 

Lemma 3. Let U C S be a non-empty set of pairwise compatible states of A. 
For (or,0) € Xi x Xox, if 9* (U, (0r,0)) # 0, then the outputs X of A* 
evaluate to \* (U, (cr, a)). In all other cases, every output of A* evaluates to 0. 
Note that 6*(U,(o7,0)) = 0 iff all outputs n; of the circuit A* evaluate to 


0. This case can be easily detected by checking if V nj, evaluates to 0. We 
therefore have the following result. 
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Theorem 4. The sequential circuit obtained with A* as neat-state function and 
A* as output function is a correct implementation of transducer Tx , assuming 
(a) the initial state is po = 1 and p; = 0 for all j € (1,... k — 1), and (b) the 


output is interpreted as L whenever MESS n, evaluates to 0. 


6 Experiments and Evaluation 


We implemented the synthesis pipeline depicted in Figure 2 in a tool called 
DepSynt (accessible at https: //github.com/eliyaoo32/DepSynt), using sym- 
bolic approach of Section 5. For Steps 1., 4., of the pipeline, i.e., construction 
of A, and synthesis of Ty, we used the tool Spot [7], a widely used library for 
representing and manipulating NBAs. We then experimented with all available 
reactive synthesis benchmarks from the SYNTCOMP [21] competition, a total 
of 1,141 LTL specifications over 31 benchmark families. 

All our experiments were run on a computer cluster, with each problem in- 
stance run on an Intel Xeon Gold 6130 CPU clocking at 2.1 GHz with 2GB 
memory and running Rocky Linux 8.6. Our investigation was focussed on an- 
swering two main research questions: 

RQ1: How prevalent are dependent outputs in reactive synthesis benchmarks? 
RQ2: Under what conditions, if any, is reactive synthesis benefited by our ap- 
proach, i.e., of identifying and separately processing dependent output variables? 


Dependency Prevalence. To answer RQ1, we implemented the algorithm in 
Section 3 and executed it with a timeout of 1 hour. Within this time, we were 
able to find 300 benchmarks out of 1,141 SYNTCOMP benchmarks, that had 
at least 1 dependent output variable (as per Definition 3). Out of the 1,141 
benchmarks, 260 had either timeout (41 total) or out-of-memory (219 total), 
out of which 227 failed because of the NBA construction (adapted from Spot), 
ie, Step 1 in our pipeline, did not terminate. We found that all the bench- 
marks with at least 1 dependent variable in fact belong to one of 5 bench- 
mark families, as seen in Table 1. In order to measure the prevalence of de- 
pendency we evaluated (1) the number of dependent variables and (2) the 


i :4. — Total dependent vars : 
dependency ratio = ~ta output vare. ` Out of those depicted, Mux (for mul- 


Benchmark Family|Total|Completed|Found Dep|Avg Dep Ratio 
ltl2dpa 24 24 24 .434 
mux 12 12 4 1 
shift 11 4 4 1 
tsl-paper 118 117 115 .46 
tsl-smart-home-jarvis | 189 167 153 33 


Table 1. Summary for 5 benchmark families, indicating the no. of benchmarks, where 
the dependency-finding process was completed, the total count of benchmarks with 
dependent variables, and the average dependency ratio among those with dependencies. 


tiplexer) and shift (for shift-operator operator) were two benchmark families 
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where dependency ratio was 1. In total, among all those where our dependency 
checking algorithm terminated, we found 26 benchmarks with all the output 
variables dependent. Of these 4 benchmarks were from Shift, 4 benchmarks 
from mux, 14 benchmarks from tsl-paper, and 4 from tsl-smart-home-jarvis. 
Looking beyond total dependency, 
among the 300 benchmarks with at 
least 1 dependent variable, we found a 
diverse distribution of dependent vari- 250 
ables as shown in Figure 3 (distribu- 
tion wrt dependency ratio is in [2]). 


300 


Utilizing Dependency for Reac- 
tive Synthesis: Comparison with 
other tools. Despite a large 1 hr 
time out, we noticed that most de- 
pendent variables were found within ò 
10-12 seconds. Hence, in our tool 
DepSynt, we limited the time for 
dependency-check to an empirically Fig. 3. Cumulative count of benchmarks 
determined 12 seconds, and declared for each unique value of Total Dependent 
unchecked variables after this time Variables. F(x) on y-axis represents how 
as non-dependent. Since synthesis of many benchmarks have at most c (on x- 
non-dependents Ty (Step 5. of the axis) dependent variables. 

pipeline) is implemented directly us- 

ing Spot APIs, the difference between our approach and Spot is minimal when 
there are a large number of non-dependent variables. T'his motivated us to di- 
vide our experimental comparison, among the 300 benchmarks where at least 
one dependent variables was found, into benchmarks with at most 3 non- 
dependent variables (162 benchmarks) and more than 3 non-dependent variables 
(138 benchmarks). We compared DepSynt with two state-of-the-art synthesis 
tools, that won in different tracks of SYNTCOMP23’ [21]: (i) Ltlsynt (based on 
Spot) [7] with different configurations ACD, SD, DS, LAR, and (ii) Strix [22] 
with the configuration of BFS for exploration and FPI as parity game solver (the 
overall winning configuration/tool in SYNTCOMP’23). All the tools had a total 
timeout of 3 hours per benchmark. As can be seen from Figure 4, indeed for the 
case of < 3 non-dependent variables, DepSynt outperforms the highly optimized 
competition-winning tools. Even for > 3 case, as shown in Figure 5, the perfor- 
mance of DepSynt is comparable to other tools, only beaten eventually by Strix. 
DepSynt uniquely solved 2 specifications for which both Strix and Ltlsynt timed 
out after 3600s, the benchmarks are mux32, and mux64, and solved in 2ms, and 
4ms respectively. 

Analyzing time taken by different parts of the pipeline. In order to better 
understand where DepSynt spends its time, we plotted in Figure 6 the normalized 
time distribution of DepSynt. We can see that synthesizing a strategy for depen- 
dent variables is very fast (the yellow portion)- justifying its theoretical linear 
complexity bound, and so is the pink region depicting searching for dependency 
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(again, a poly-time algorithm), especially compared to the blue synthesizing a 
strategy for the non-dependent variables, and the green which is NBA build time. 
'This also explains why having a high dependency ratio alone does not help our 
approach, since even with a high ratio, the number of non-dependent variables 
could be large, resulting in worse performance overall. 


Analysis of the Projection step (Step 3.) of Pipeline. The rationale for 
projecting variables from the NBA is to reduce the number of output non- 
dependent variables in the synthesis of the NBA, which is the most expen- 
sive phase as Figure 6 shows. To see if this indeed contributes to our bet- 
ter performance, we asked if projecting the dependent variables reduces the 
BDDs' sizes, in terms of total nodes, (the BDD represents the transitions). 
Figure 7 shows that the BDDs' sizes 

are reduced significantly where theto- ^ "'[—— 
tal of non-dependent variables is at 
most 3, in cases of total dependency, 
the BDD just vanishes and is replaced 
by the constant true/false. For the 
case of total non-dependent is 4 or 
more, the BDD size is reduced as well. 


Time (seconds) 


An ablation experiment with a a —— 
Spot. As a final check, that depen- E instances 
dency was causing the improvements 


seen, we conducted a control/ablation Pig dc Cactus Biot comparing DepSynt 


experiment where in DepSynt we gave and SpotModular on 162 benchmarks with 
zero-timeout to find dependency, clas- at most 3 non-dependent variables. 


sified all output variables as non- 

dependent, and called this SpotModular. As can be seen in Figure 8, for the 
case of benchmarks with at least 1 dependent and at most 3 non-dependent 
variables, this clearly shows the benefit of dependency-checking. In the full ver- 
sion [2], we show that for other cases we do not see this. 


Summary. Overall, we answered both the research questions we started with. 
Indeed there are several benchmarks with dependent variables, and using our 
pipeline does give performance benefits when no. of non-dependent variables is 
low. Our recipe would be to first run our poly-time check to see if there are depen- 
dents and use our approach if there are not too many non-dependents; otherwise 
switch to any existing method. To summarize our comparisons: wrt Strix, we 
found 252 benchmarks that had dependent variables in which DepSynt took less 
time than Strix. Out of which, in 126 benchmarks DepSynt took at least 1 second 
less than Strix. Among these, for 10 benchmarks (shift16, LightsTotal d65ed84e, 
LightsTotal_9cbf2546, Lights Total. 06e9cad4, Lights2.f3987563, Lights2_0f5381e9, 
FelixSpecFixed3.core_b209ff21, Lights2_b02056d6, Lights2_06e9cad4, LightsTo- 
tal_2c5b09da) the time taken by DepSynt was at least 10 seconds less than that 
taken by Strix. These are the examples that are easier to solve by DepSynt 
than by Strix. For shift16, the difference was more than 1056 seconds in favor of 
DepSynt. Interestingly, shift16 also has all output variables dependent. 
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When comparing with Ltlsynt, we found 193 benchmarks that had dependent 
variables in which DepSynt took less time than Ltlsynt. Among these, in 27 
benchmarks DepSynt took at least 1 second less than Ltlsynt. Of these, there is 
one benchmark (ModifiedLedMatrix5X) for which the time taken by DepSynt 
was at least 10 seconds less than that taken by Ltlsynt. Specifically, DepSynt 
took 5 seconds and Ltlsynt took 55 seconds. 


7 Conclusion 


In this work, we have introduced the notion of dependent variables in the con- 
text of reactive synthesis. We showed that dependent variables are prevalent 
in reactive synthesis benchmarks and suggested a synthesis approach that may 
utilize these dependency for better synthesis. As part of future work, we wish to 
explore heuristics for choosing "good" maximal subsets of dependent variables. 
We also wish to explore integration of our method in other reactive synthesis 
tools such as Strix. 
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Fig. 4. Cactus plot comparing DepSynt, LtlSynt, and Strix on 
162 benchmarks with at most 3 non-dependent variables. 
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Fig. 5. Cactus plot comparing DepSynt, LtlSynt, and Strix on 
138 benchmarks with more than 3 non-dependent variables. 
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Fig. 6. Normalized time distribution of DepSynt sorted by total duration over bench- 
marks that could be solved successfully by DepSynt. Each color represents a different 
phase of DepSynt. Pink is searching for dependency, green is the NBA build, blue is 
synthesis of non-dependent variables and yellow is dependent variables synthesis. 
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Fig. 7. This figure illustrates the total BDD sizes of the NBA edges before and after 
the projection of the dependent variables from the NBA edges, the left figure is over 
benchmarks with at most 3 non-dependent variables and the right figure is over bench- 
marks with 4 or more non-dependent variables. The solid line presents the projected 
BDD size and the dotted line presents the original BDD size. The y-axis is presented 
in symmetric log-scale. Benchmarks are sorted by the projected NBA’s BDD total size. 
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Abstract. This paper presents an approach for synthesizing provably 
correct control envelopes for hybrid systems. Control envelopes charac- 
terize families of safe controllers and are used to monitor untrusted con- 
trollers at runtime. Our algorithm fills in the blanks of a hybrid system's 
Sketch specifying the desired shape of the control envelope, the possible 
control actions, and the system's differential equations. In order to max- 
imize the flexibility of the control envelope, the synthesized conditions 
saying which control action can be chosen when should be as permissive 
as possible while establishing a desired safety condition from the avail- 
able assumptions, which are augmented if needed. An implicit, optimal 
solution to this synthesis problem is characterized using hybrid systems 
game theory, from which explicit solutions can be derived via symbolic 
execution and sound, systematic game refinements. Optimality can be 
recovered in the face of approximation via a dual game characterization. 
The resulting algorithm, Control Envelope Synthesis via Angelic Refine- 
ments (CESAR), is demonstrated in a range of safe control envelope 
synthesis examples with different control challenges. 


Keywords: Hybrid systems - Program synthesis - Differential game logic 


1 Introduction 


Hybrid systems are important models of many applications, capturing their dif- 
ferential equations and control [271H1]3[33H]P8]. For overall system safety, the 
correctness of the control decisions in a hybrid system is crucial. Formal verifica- 
tion techniques can justify correctness properties. Such correct controllers have 
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been identified in a sequence of challenging case studies [34]40]12]32]19]14222]. A 
useful approach to verified control is to design and verify a safe control envelope 
around possible safe control actions. Safe control envelopes are nondeterminis- 
tic programs whose every execution is safe. In contrast with controllers, control 
envelopes define entire families of controllers to allow control actions under as 
many circumstances as possible, as long as they maintain the safety of the hybrid 
system. Safe control envelopes allow the verification of abstractions of control 
systems, isolating the parts relevant to the safety feature of interest, without in- 
volving the full complexity of a specific control implementation. The full control 
system is then monitored for adherence to the safe control envelope at run- 
time [29]. The control envelope approach allows a single verification result to 
apply to multiple specialized control implementations, optimized for different 
objectives. It puts industrial controllers that are too complex to verify directly 
within the reach of verification, because a control envelope only needs to model 
the safety-critical aspects of the controller. Control envelopes also enable applica- 
tions like justified speculative control [17], where machine-learning-based agents 
control safety-critical systems safeguarded within a verified control envelope, or 
[36], where these envelopes generate reward signals for reinforcement learning. 

Control envelope design is challenging. Engineers are good at specifying the 
shape of a model and listing the possible control actions by translating client 
specifications, which is crucial for the fidelity of the resulting model. But identi- 
fying the exact control conditions required for safety in a model is a much harder 
problem that requires design insights and creativity, and is the main point of the 
deep area of control theory. Most initial system designs are incorrect and need 
to be fixed before verification succeeds. Fully rigorous justification of the safety 
of the control conditions requires full verification of the resulting controller in 
the hybrid systems model. We present a synthesis technique that addresses this 
hard problem by filling in the holes of a hybrid systems model to identify a 
correct-by-construction control envelope that is as permissive as possible. 

Our approach is called Control Envelope Synthesis via Angelic Refinements 
(CESAR). The idea is to implicitly characterize the optimal safe control envelope 
via hybrid games yielding maximally permissive safe solutions in differential 
game logic [33]. To derive explicit solutions used for controller monitoring at 
runtime, we successively refine the games while preserving safety and, if possible, 
optimality. Our experiments demonstrate that CESAR solves hybrid systems 
synthesis challenges requiring different control insights. 


Contributions. The primary contributions of this paper behind CESAR are: 


optimal hybrid systems control envelope synthesis via hybrid games. 
differential game logic formulas identifying optimal safe control envelopes. 


— refinement techniques for safe control envelope approximation, including 
bounded fixpoint unrollings via a recurrence, which exploits action perma- 
nence (a hybrid analogue to idempotence). 

— a primal/dual game counterpart optimality criterion. 
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2 Background: Differential Game Logic 


We use hybrid games written in differential game logic (dGL, [33]) to represent 
solutions to the synthesis problem. Hybrid games are two-player noncooperative 
zero-sum sequential games with no draws that are played on a hybrid system 
with differential equations. Players take turns and in their turn can choose to 
act arbitrarily within the game rules. At the end of the game, one player wins, 
the other one loses. The players are classically called Angel and Demon. Hybrid 
systems, in contrast, have no agents, only a nondeterministic controller running 
in a nondeterministic environment. The synthesis problem consists of filling in 
holes in a hybrid system. Thus, expressing solutions for hybrid system synthesis 
with hybrid games is one of the insights of this paper. 

An example of a game is (v:=1Nv:=-—1); (z' 2 v). In this game, first 
Demon chooses between setting velocity v to 1, or to -1. Then, Angel evolves 
position z as x’ = v for a duration of her choice. Differential game logic uses 
modalities to set win conditions for the players. For example, in the formula 
[(v := 1 A v := —1); (z' = v)] x Z 0, Demon wins the game when z Æ 0 at the 
end of the game and Angel wins otherwise. The overall formula represents the 
set of states from which Demon can win the game, which is z Z 0 because when 
x < 0, Demon has the winning strategy to pick v := —1, so no matter how long 
Angel evolves x’ = v, x remains negative. Likewise, when z > 0, Demon can pick 
v :— 1. However, when x = 0, Angel has a winning strategy: to evolve z' = v for 
zero time, so that x remains zero regardless of Demon's choice. 

We summarize dGL's program notation (Tablefip. See for full exposition. 
Assignment x :— 0 instantly changes the value of variable x to the value of 0. 
Challenge ?4» continues the game if ~ is satisfied in the current state, otherwise 
Angel loses immediately. In continuous evolution z' = 0 & v Angel follows the 
differential equation x’ = 0 for some duration of her choice, but loses immediately 
on violating w at any time. Sequential game o; first plays œ and when it 


Table 1: Hybrid game operators for two-player hybrid systems 


Game Effect 
T:—0 assign value of term 0 to variable x 
Tab Angel passes challenge if formula ~ holds in current state, else loses 
immediately 
(zi —60,,..., Angel evolves x; along differential equation system z; = 0; 
En = On & v) for choice of duration > 0, loses immediately when violating v 
o5 B sequential game, first play hybrid game o, then hybrid game 8 
aU Angel chooses to follow either hybrid game a or 8 
a* Angel repeats hybrid game a, choosing to stop or go after each a 
at dual game switches player roles between Angel and Demon 
ang demonic choice (a U 682)? gives choice between a and £8 to Demon 


a demonic repetition ((a7)")7 gives control of repetition to Demon 
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terminates without a player having lost, continues with 8. Choice aU lets Angel 
choose whether to play a or 8. For repetition o*, Angel repeats œ some number 
of times, choosing to continue or terminate after each round. The dual game o? 
switches the roles of players. For example, in the game ?)4, Demon passes the 
challenge if the current state satisfies i», and otherwise loses immediately. 

In games restricted to the structures listed above but without o, all choices 
are resolved by Angel alone with no adversary, and hybrid games coincide with 
hybrid systems in differential dynamic logic (dL) [83]. We will use this restriction 
to specify the synthesis question, the sketch that specifies the shape and safety 
properties of control envelopes. But to characterize the solution that fills in the 
blanks of the control envelope sketch, we use games where both Angel and Demon 
play. Notation we use includes demonic choice o (1 8, which lets Demon choose 
whether to run a or £8. Demonic repetition a* lets Demon choose whether to 
repeat a choosing whether to stop or go at the end of every run. We define a*<" 
and a*<" for angelic and demonic repetitions respectively of at most n times. 

In order to express properties about hybrid games, differential game logic 
formulas refer to the existence of winning strategies for objectives of the games 
(e.g., a controller has a winning strategy to achieve collision avoidance despite 
an adversarial environment). The set of dGL formulas is generated by the follow- 
ing grammar (where ~ € («, €, 2,2, ») and 61,62 are arithmetic expressions 
in +,—,-,/ over the reals, x is a variable, o is a hybrid game): 


$:061—6;:|20|ó^v|óvv|oó v|Vró|3zol|[a] o | (o) ġ 


Comparisons of arithmetic expressions, Boolean connectives, and quantifiers over 
the reals are as usual. The modal formula (o) ¢ expresses that player Angel has 
a winning strategy to reach a state satisfying ¢ in hybrid game a. Modal formula 
[o] d expresses the same for Demon. The fragment without modalities is first- 
order real arithmetic. Its fragment without quantifiers is called propositional 
arithmetic Pg. Details on the semantics of dGL can be found in [33]. A formula 
ġ is valid, written F @, iff it is true in every state w. States are functions assigning 
a real number to each variable. For instance, ¢ — [o] v is valid iff, from all initial 
states satisfying ¢, Demon has a winning strategy in game a to achieve v. 


Control Safety Envelopes by Example. In order to separate safety critical aspects 
from other system goals during control design, we abstractly describe the safe 
choices of a controller with safe control envelopes that deliberately underspecify 
when and how to exactly execute certain actions. They focus on describing in 
which regions it is safe to take actions. For example, Model[1] designs a train 
control envelope that must stop by the train by the end of movement au- 
thority e located somewhere ahead, as assigned by the train network scheduler. 
Past e, there may be obstacles or other trains. The train's control choices are 
to accelerate or brake as it moves along the track. The goal of CESAR is to 
synthesize the framed formulas in the model, that are initially blank. 

Line{6] describes the safety property that is to be enforced at all times: the 
train driving at position p with velocity v must not go past position e. Line{]] 
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Model 1 The train ETCS model (slightly modified from [34]). Framed formulas 
are initially blank and are automatically synthesized by our tool as indicated. 


asum|1 A20AB»50AT»0A^Av20A 


ctrlable| 2 |e- p> v^/2B | H 
KT 3 ( (te— p» vT 4+ AT^/2 4 (v - AT)! /2B |; a:— A) 
4 U (?true|; a: —B) ); 
plant | 5 (£:20; {p =v, v att -1&tX T^v 2 0]) 


safel 6 V')(e— p » 0) 


lists modeling assumptions: the train is capable of both acceleration (A70) and 
deceleration (B0), the controller latency is positive (70) and the train cannot 
move backwards as a product of braking (this last fact is also reflected by having 
v > 0 as a domain constraint for the plant on Linef}. These assumptions are 
fundamentally about the physics of the problem being considered. In contrast, 
Line[2] features a controllability assumption that can be derived from careful 
analysis. Here, this synthesized assumption says that the train cannot start so 
close to e that it won't stop in time even if it starts braking immediately. Linef] 
and Line[4] describe a train controller with two actions: accelerating (a := A) 
and braking (a :— — B). Each action is guarded by a synthesized formula, called 
an action guard that indicates when it is safe to use. Angel has control over 
which action runs, and adversarially plays with the objective of violating safety 
conditions. But Angel's options are limited to only safe ones because of the 
synthesized action guards, ensuring that Demon still wins and the overall formula 
is valid. In this case, braking is always safe whereas acceleration can only be 
allowed when the distance to end position e is sufficiently large. Finally, the 
plant on Linef] uses differential equations to describe the train’s kinematics. A 
timer variable t is used to ensure that no two consecutive runs of the controller 
are separated by more than time T. Thus, this controller is time-triggered. 


Overview of CESAR. CESAR first identifies the optimal solution for the blank 
of Linef] Intuitively, this blank should identify a controllable invariant, which 
denotes a set of states where a controller with choice between acceleration and 
braking has some strategy (to be enforced by the conditions of LineB]and Line[4) 
that guarantees safe control forever. Such states can be characterized by the fol- 
lowing dGL formula where Demon, as a proxy for the controller, decides whether 
to accelerate or brake: [((a := A N a := —B); plant)*] safe where plant and safe 
are from Model[1] When this formula is true, Demon, who decides when to brake 
to maintain the safety contract, has a winning strategy that the controller can 
mimic. When it is false, Demon, a perfect player striving to maintain safety, has 
no winning strategy, so a controller has no guaranteed way to stay safe either. 
'This dGL formula provides an implicit characterization of the optimal con- 
trollable invariant from which we derive an explicit formula in Pp to fill the blank 
with using symbolic execution. Symbolic execution solves a game following the 
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axioms of dGL to produce an equivalent Pg formula (Section|3.7). However, our 
dGL formula contains a loop, for which symbolic execution will not terminate 
in finite time. To reason about the loop, we refine the game, modifying it so 
that it is easier to symbolically execute, but still at least as hard for Demon to 
win so that the controllable invariant that it generates remains sound. In this 
example, the required game transformation first restricts Demon's options to 
braking. Then, it eliminates the loop using the observation that the repeated 
hybrid iterations (a := — B; plant)* behave the same as just following the con- 
tinuous dynamics of braking for unbounded time. It replaces the original game 
with a := —B;t:— 0; (p = v,v' =a & ^v 2 0}, which is loop-free and 
easily symbolically executed. Symbolically executing this game to reach safet 
condition safe yields controllable invariant e — p > a to fill the blank of Line? 

Intuitively, this refinement (formalized in Section|3.4) captures situations 
where the controller stays safe forever by picking a single control action (brak- 
ing). It generates the optimal solution for this example because braking forever 
is the dominant strategy: given any state, if braking forever does not keep the 
train safe, then certainly no other strategy will. However, there are other prob- 
lems where the dominant control strategy requires the controller to strategically 
switch between actions, and this refinement misses some controllable invariant 
states. So we introduce a new refinement: bounded game unrolling via a recur- 
rence (SectionB.5). A solution generated by unrolling n times captures states 
where the controller can stay safe by switching control actions up to n times. 

Having synthesized the controllable invariant, CESAR fills the action guards 
(Line[3] and Line[1). An action should be permissible when running it for one 
iteration maintains the controllable invariant. For example, acceleration is safe 
to execute exactly when [a := A;plantje — p E um We symbolically execute 
this game to synthesize the formula that fills the guard of Linef] 


3 Approach 


This section formally introduces the Control Envelope Synthesis via Angelic Re- 
finements (CESAR) approach for hybrid systems control envelope synthesis. 


3.1 Problem Definition 


We frame the problem of control envelope synthesis in terms of filling in holes 
uin a problem of the following shape: 


prob = assum ^ vi [((U; (?L4;5 act;)) ; plant) "| safe. (1) 


Here, the control envelope consists of a nondeterministic choice between a finite 
number of guarded actions. Each action act; is guarded by a condition uu; to be 
determined in a way that ensures safety within a controllable invariant [6]18] — 
to be synthesized also. The plant is defined by the following template: 


plant = £:205 {2’ = f(x), t 21 & domain At «€ T]. (2) 


150 Aditi Kabra, Jonathan Laurent, Stefan Mitsch, and André Platzer 


This ensures that the plant must yield to the controller after time T at most, 
where T' is assumed to be positive and constant. In addition, we make the fol- 
lowing assumptions: 


1. Components assum, safe and domain are propositional arithmetic formulas. 

2. Timer variable t is fresh (does not occur except where shown in template). 

3. Programs act; are discrete dL programs that can involve choices, assignments 
and tests with propositional arithmetic. Variables assigned by act; must not 
appear in safe. In addition, act; must terminate in the sense that F (act;) true. 

4. The modeling assumptions assum are invariant in the sense that Æ assum — 
[(U; acti); plant] assum. This holds trivially for assumptions about constant 
parameters such as A > 0 in Model[1] and this ensures that the controller 
can always rely on them being true. 


Definition 1. A solution to the synthesis problem above is defined as a pair 
(1, G) where I is a formula and G maps each action index i to a formula Gi. In 
addition, the following conditions must hold: 


1. Safety is guaranteed: prob(I,G) = problu > I, e Gi] is valid and 
(assum ^ I) is a loop invariant that proves it so. 
2. There is always some action: (assum ^ I) + V; Gj is valid. 


Condition [2] is crucial for using the resulting nondeterministic control envelope, 
since it guarantees that safe actions are always available as a fallback. 


3.2 An Optimal Solution 


Solutions to a synthesis problem may differ in quality. Intuitively, a solution is 
better than another if it allows for a strictly larger controllable invariant. In 
case of equality, the solution with the more permissive control envelope wins. 
Formally, given two solutions S = (I,G) and S' = (I',G"), we say that S" is 
better or equal to S (written S E 5") if and only if F assum — (I — I’) and 
additionally either F assum — —=(I’ — I) or F (assum ^ I) > A; (Gi > Gi). 
Given two solutions S and S', one can define a solution SMS’ = (I v I', i > 
(IAG; V I' A G‘)) that is better or equal to both S and S (S C SS’ and 
$' C SS"). A solution S" is called the optimal solution when it is the maximum 
element in the ordering, so that for any other solution S, S C S'. The optimal 
solution exists and is expressible in dGL: 


I°™ = [((nyact;); plant)*] safe (3) 
GP" = [act;; plant] Z°. (4) 


1 
Intuitively, J?P* characterizes the set of all states from which an optimal con- 
troller (played here by Demon) can keep the system safe forever. In turn, G?P* is 
defined to allow any control action that is guaranteed to keep the system within 
Trt until the next control cycle as characterized by a modal formula. SectionB.3] 
formally establishes the correctness and optimality of S'?P* = (T?Pt, (opt), 
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While it is theoretically reassuring that an optimal solution exists that is 
at least as good as all others and that this optimum can be characterized in 
dGL, such a solution is of limited practical usefulness since Eq. cannot be 
executed without solving a game at runtime. Rather, we are interested in explicit 
solutions where I and G are quantifier-free real arithmetic formulas. There is no 
guarantee in general that such solutions exist that are also optimal, but our goal 
is to devise an algorithm to find them in the many cases where they exist or find 
safe approximations otherwise. 


3.3 Controllable Invariants 


The fact that $°P* is a solution can be characterized in logic with the notion of 
a controllable invariant that, at each of its points, admits some control action 
that keeps the plant in the invariant for one round. All lemmas and theorems 
throughout this paper are proved in the extended preprint [21] Appendix B]. 


Definition 2 (Controllable Invariant). A controllable invariant is a formula 
I such that E I — safe and F I — V; [act; ; plant] J. 


From this perspective, /?P* can be seen as the largest controllable invariant. 


Lemma 1. /?P* is a controllable invariant and it is optimal in the sense that 
E I + I™ for any controllable invariant I. 


Moreover, not just I?P*, but every controllable invariant induces a solution. 
Indeed, given a controllable invariant I, we can define G(I) = (i + [act; ; plant] 7) 
for the control guards induced by I. G(T) chooses as the guard for each action 
act; the modal condition ensuring that act;, preserves I after the plant. 


Lemma 2. IfI is a controllable invariant, then (I, G(I)) is a solution (Def.|1]. 


Conversely, a controllable invariant can be derived from any solution. 


Lemma 3. If (1, G) is a solution, then I' = (assum A I) is a controllable invari- 
ant. Moreover, we have (I, G) C (I', G(I^)). 


Solution comparisons w.r.t. E reduce to implications for controllable invariants. 


Lemma 4. IfI and I' are controllable invariants, then (I, G(I)) E (I', G(I^)) if 
and only if E assum > (I > I^). 


Taken together, these lemmas allow us to establish the optimality of S'P*, 


Theorem 1. S?P' is an optimal solution (i.e. a maximum w.r.t. C) of Def|1] 


This shows the roadmap for the rest of the paper: finding solutions to the control 
envelope synthesis problem reduces to finding controllable invariants that imply 
I?P*, which can be found by restricting the actions available to Demon in J°Pt 
to guarantee safety, thereby refining the associated game. 
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3.4 One-Shot Fallback Refinement 


The simplest refinement of I?P* is obtained when fixing a single fallback action 
to use in all states (if that is safe). A more general refinement considers different 
fallback actions in different states, but still only plays one such action forever. 
Using the dGL axioms, any loop-free dGL formula whose ODEs admit solutions 
expressible in real arithmetic can be automatically reduced to an equivalent 
first-order arithmetic formula (in FOLg). An equivalent propositional arithmetic 
formula in Pg can be computed via quantifier elimination (QE). For example: 


(v:=1Nv:=—1); (z' 2v)]z 40 

v:=1Nv:=—1] Hr =v} oe 40 by [I] 

= p= =o}Je40 v k= =e 40 by ff] 

= {r =1}]¢ #0 v [(z =—1}]2 40 by [=] 
= (Vt»0x--tz0)V (Vti>0x2—t £0) by [E=] 
zrz»50vzzc«O0 by QE . 


Even when a formula features nonsolvable ODEs, techniques exist to compute 
weakest preconditions for differential equations, with conservative approxima- 
tions [38] or even exactly in some cases [35]8]. In the rest of this section and for 
most of this paper, we are therefore going to assume the existence of a reduce 
oracle that takes as an input a loop-free dGL formula and returns a quantifier- 
free arithmetic formula that is equivalent modulo some assumptions. SectionB.7] 
shows how to implement and optimize reduce. 


Definition 3 (Reduction Oracle). A reduction oracle is a function reduce 
that takes as an input a loop-free dGL formula F and an assumption A € Pg. It 
returns a formula R € Pg along with a boolean flag exact such that the formula 
A — (R > F) is valid, and if exact is true, then A> (R & F) is valid as well. 


Back to our original problem, /?P* is not directly reducible since it involves a 
loop. However, conservative approximations can be computed by restricting the 
set of strategies that the Demon player is allowed to use. One extreme case allows 
Demon to only use a single action act; repeatedly as a fallback (e.g. braking in the 
train example). In this case, we get a controllable invariant [(act; ; plant)"] safe, 
which further simplifies into [act; ; plant,,] safe with 


plant, = (z' = f(x), 21 & domain} 


a variant of plant that never yields control. For this last step to be valid though, 
a technical assumption is needed on act;, which we call action permanence. 


Definition 4 (Action Permanence). An action act; is said to be permanent 
if and only if (act; ; plant; act;) = (act; ; plant), i.e., they are equivalent games. 


Intuitively, an action is permanent if executing it more than once in a row 
has no consequence for the system dynamics. This is true in the common case 
of actions that only assign constant values to control variables that are read but 
not modified by the plant, such as a:= A and a:— —B in Model[1] 
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Lemma 5. Jf act; is permanent, F |(act; ; plant)" ] safe < [act; ; plant, ] safe. 


Our discussion so far identifies the following approximation to our original syn- 
thesis problem, where P denotes the set of all indexes of permanent actions: 

I? = [(Niepact;); plants] safe, 
G? = [act,; plant] 7°. 


LU 


Here, J? encompasses all states from which the agent can guarantee safety in- 
definitely with a single permanent action. G? is constructed according to G(I°) 
and only allows actions that are guaranteed to keep the agent within /? until 
the next control cycle. Note that I? degenerates to false in cases where there are 
no permanent actions, which does not make it less of a controllable invariant. 


Theorem 2. I? is a controllable invariant. 
Moreover, in many examples of interest, J? and J°P* are equivalent since an 


optimal fallback strategy exists that only involves executing a single action. 
'This is the case in particular for Model[] where 


I? = [a:=—-B; (p 2v, v -a&v»0)]e-p»0 
= e—p>v’"/2B 


characterizes all states at safe braking distance to the obstacle and G° associates 
the following guard to the acceleration action: 


G9 , = [a: A; (p vv =a,t' =1& v>OAt<T}le—p>v*?/2B 
= e-—p>vT + AT?/2 - (v + AT /2B 


That is, accelerating is allowed if doing so is guaranteed to maintain sufficient 
braking distance until the next control opportunity. SectionB.6] discusses auto- 
matic generation of a proof that (I9, G?) is an optimal solution for Model[i] 


3.5 Bounded Fallback Unrolling Refinement 


In Section|3.4] we derived a solution by computing an underapproximation of 
I?P* where the fallback controller (played by Demon) is only allowed to use 
a one-shot strategy that picks a single action and plays it forever. Although 
this approximation is always safe and, in many cases of interest, happens to be 
exact, it does lead to a suboptimal solution in others. In this section, we allow 
the fallback controller to switch actions a bounded number of times before it 
plays one forever. There are still cases where doing so is suboptimal (imagine a 
car on a circular race track that is forced to maintain constant velocity). But 
this restriction is in line with the typical understanding of a fallback controller, 
whose mission is not to take over a system indefinitely but rather to maneuver 
it into a state where it can safely get to a full stop [32]. 
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For all bounds n € N, we define a game where the fallback controller (played 
by Demon) takes at most n turns to reach the region /? in which safety is guar- 
anteed indefinitely. During each turn, it picks a permanent action and chooses a 
time 0 in advance for when it wishes to play its next move. Because the environ- 
ment (played by Angel) has control over the duration of each control cycle, the 
fallback controller cannot expect to be woken up after time 0 exactly. However, 
it can expect to be provided with an opportunity for its next move within the 
[0,0 -- T] time window since the plant can never execute for time greater than 
T. Formally, we define J” as follows: 


I" = [step* $” ; forever] safe forever = (Niep act;) ; planta 


step = (8:—*; ?0 > 0)4 ; (Niep acti); plantg, 7; ?safe; ?t > 0 


where plant; is the same as plant, except that the domain constraint t < T is 
replaced by t < 0 + T. Equivalently, we can define J” by induction as follows: 


putt = I” v [step] I" I? = [forever] safe, (5) 


where the base case coincides with the definition of I? in Section|3.4| Importantly, 
I” is a loop-free controllable invariant and so reduce can compute an explicit 
solution to the synthesis problem from J”. 


Theorem 3. I” is a controllable invariant for all n > 0. 


Theorem]establishes a nontrivial result since it overcomes the significant gap 
between the fantasized game that defines J” and the real game being played by 
a time-triggered controller. The proof critically relies on the action permanence 
assumption along with a result [21] Lemma 6] establishing that ODEs preserve 
a specific form of reach-avoid property as a result of being deterministic. 


Example. As an illustration, consider the example in Fig.[I] and Model] of a 
2D robot moving in a corridor that forms an angle. The robot is only allowed 
to move left or down at a constant velocity and must not crash against a wall. 
Computing J° gives us the vertical section of the corridor, in which going down 
is a safe one-step fallback. Computing I! forces us to distinguish two cases. If the 
corridor is wider than the maximal distance travelled by the robot in a control 
cycle (VT > 2R), then the upper section of the corridor is controllable (with the 
exception of a dead-end that we prove to be uncontrollable in Sectionf3.6}. On the 
other hand, if the corridor is too narrow, then I! is equivalent to /?. Formally, 
we have I! = (y >-—R A^ |2| < R) v (VT « 2R ^ (x » —R ^ |y| < R)). 
Moreover, computing J? gives a result that is equivalent to J+. From this, we 
can conclude that It is equivalent to J” for all n > 1. Intuitively, it is optimal 
with respect to any finite fallback strategy (restricted to permanent actions). 
The controllable invariant unrolling J” has a natural stopping criterion. 


Lemma 6. If I" 4 I"*! is valid for some n > 0, then I" + I™ is valid for 
all m > n and I” €» I” is valid where I" = |step* ; forever] safe. 
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Fig. 1: Robot navigating a corridor (Model). A 2D robot must navigate safely 
within a corridor with a dead-end without crashing against a wall. The corridor 
extends infinitely on the bottom and on the right. T'he robot can choose between 
going left and going down with a constant speed V. The left diagram shows 1? 
in gray. The right diagram shows 7! under the additional assumption VT < 2R 
(I* and I? are otherwise equivalent). A darker shade of gray is used for regions 
of I! where only one of the two available actions is safe according to G1. 


Model 2 Robot navigating a corridor with framed solutions of holes. 


assum|1 V>O0AT>0 
ctrlable} 2 A|(y>—RA |a|< R)V(VT <2RA (a >-RA |y| < R) > HK 
ctl 3 ( [r2 —R- VT]; v :2 V i v :=0) 
4 U (Jy R-VT V z< R|; vz:=0; vy:=V) ); 
plant | 5 (t:=0; {x = vs, y =w, t =1&t<T} 
safe| 6 V')((r > -3R ^ |y| < R) V (y > —R ^ |z| < R)) 


3.6 Proving Optimality via the Dual Game 


Suppose one found a controllable invariant J using techniques from the previous 
section. To prove it optimal, one must show that F assum — (I?P* > I). By 
contraposition and [a] P + —^(a) =P ([-]), this is equivalent to proving that: 


E assum A ^I — (((M; act;) ; plant)") ^safe. (6) 
eR SS 


a] opt 


We define the largest uncontrollable region U ?P* = =] °P* as the right-hand side 
of implication [6] above. Intuitively, U°P' characterizes the set of all states from 
which the environment (played by Angel) has a winning strategy against the 
controller (played by Demon) for reaching an unsafe state. In order to prove the 
optimality of I, we compute a sequence of increasingly strong approximations U 
of U°Pt such that U — U?P' is valid. We do so via an iterative process, in the 
spirit of how we approximate [?P* via bounded fallback unrolling (Section|3.5), 
although the process can be guided by the knowledge of J this time. If at any 
point we manage to prove that assum — (I V U) is valid, then J is optimal. 
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One natural way to compute increasingly good approximations of U°P* is 
via loop unrolling. The idea is to improve approximation U by adding states 
from where the environment can reach U by running the control loop once, 
formally, (0; act;); plant) U. This unrolling principle can be useful. However, 
it only augments U with new states that can reach U in time T at most. So 
it cannot alone prove optimality in cases where violating safety from an unsafe 
state takes an unbounded amount of time. 

For concreteness, let us prove the optimality of I? in the case of Model[1] 
In [34] essentially the following statement is proved when arguing for optimality: 
E assum ^ ^I? — ((a := —B ; plant)*) ^safe. This is identical to our optimality 
criterion from Eq. (6), except that Demon’s actions are restricted to braking. 
Intuitively, this restriction is sound since accelerating always makes things worse 
as far as safety is concerned. If the train cannot be saved with braking alone, 
adding the option to accelerate will not help a bit. In this work, we propose a 
method for formalizing such arguments within dGL to arbitrary systems. 

Our idea for doing so is to consider a system made of two separate copies of 
our model. One copy has all actions available whereas the other is only allowed 
a single action (e.g. braking). Given a safety metric m (i.e. a term m such that 
E m < 0 2 safe), we can then formalize the idea that “action i is always better 
w.r.t safety metric m" within this joint system. 


Definition 5 (Uniform Action Optimality). Consider a finite number of 
discrete dL programs a; and p= (z' = f(x) & Q}. Let V = BV(p) UU, BV (ai) 
be the set of all variables written by p or some oj. For any term 0 and integer 
n, write 0 for the term that results from 0 by renaming all variables v € V to 
a fresh tagged version x"? . Using a similar notation for programs and formulas, 
define p( 29 = {(aM) = f(a®), (1Y = f(z) & QU A^ QU). We say that 
action j is uniformly optimal with respect to safety metric m if and only if: 


- mO > mO 5 fa; ; (uj o, 9); pt} mM > m®), 


best;((o;);,p,m) denotes that action j is uniformly optimal with respect to m 
for actions a; and dynamics p. 


With such a concept in hand, we can formally establish the fact that criterion 
Eq. (6) can be relaxed in the existence of uniformly optimal actions. 


Theorem 4. Consider a finite number of discrete dL programs a; such that 
E (a;) true for all i and p = {x' = f(x) & q > 0}. Then, provided that 
best;((a;);,p,m) and best;((o;);,p,—q) (no other action stops earlier because 
of the domain constraint), we have: 


E (((Na;); p)*)m <0 ((a;; p)*)m<0. 


A general heuristic for leveraging T heorem[4] to grow U automatically works as 
follows. First, it considers R = assumA ^1 A-U that characterizes states that are 
not known to be controllable or uncontrollable. Then, it picks a disjunct A j Rj of 


CESAR: Control Envelope Synthesis via Angelic Refinements 157 


the disjunctive normal form of R and computes a forward invariant region V that 
intersects with it: V = A;(H; : assum, Rj F [(U;act;); plant] Rj}. Using V as 
an assumption to simplify ~U may suggest metrics to be used with T heorem|4] 
For example, observing F V > (~U — (6; > 0A 02 > 0)) suggests picking 
metric m = min(01,05) and testing whether best,;(act,p,m) is true for some 
action 7. If such a uniformly optimal action exists, then U can be updated as 
U + U v (V ^((act; ; plant)*) m < 0). The solution I! for the corridor (Model[) 
can be proved optimal automatically using this heuristic in combination with 
loop unrolling. 


3.7 Implementing the Reduction Oracle 


The CESAR algorithm assumes the existence of a reduction oracle that takes as 
an input a loop-free dGL formula and attempts to compute an equivalent formula 
within the fragment of propositional arithmetic. When an exact solution cannot 
be found, an implicant is returned instead and flagged appropriately (Def.B]. 
This section discusses our implementation of such an oracle. 

As discussed in Section[3.4] exact solutions can be computed systematically 
when all ODEs are solvable by first using the dGL axioms to eliminate modalities 
and then passing the result to a quantifier elimination algorithm for first-order 
arithmetic [9]42]. Although straightforward in theory, a naive implementation of 
this idea hits two practical barriers. First, quantifier elimination is expensive and 
its cost increases rapidly with formula complexity [I1]44]. Second, the output 
of existing QE implementations can be unnecessarily large and redundant. In 
iterated calls to the reduction oracle, these problems can compound each other. 

To alleviate this issue, our implementation performs eager simplification 
at intermediate stages of computation, between some axiom application and 
quantifier-elimination steps. This optimization significantly reduces output solu- 
tion size and allows CESAR to solve a benchmark that would otherwise timeout 
after 20 minutes in 26s. Appendix E] further discusses the impact of eager 
simplification. Still, the doubly exponential complexity of quantifier elimination 
puts a limit on the complexity of problems that CESAR can currently tackle. 

In the general case, when ODEs are not solvable, our reduction oracle is still 
often able to produce approximate solutions using differential invariants gener- 
ated automatically by existing tools . Differential invariants are formulas that 
stay true throughout the evolution of an ODE system. [l To see how they apply, 
consider the case of computing reduce([(z' = f(x))] P, A) where P is the post- 
condition formula that must be true after executing the differential equation, 
and A is the assumptions holding true initially. Suppose that formula D(x) is a 
differential invariant such that D(x) > P is valid. Then, a precondition sufficient 
to ensure that P holds after evolution is A + D(a). For example, to compute the 
precondition for the dynamics of the parachute benchmark, our reduction ora- 
cle first uses the Pegasus tool [38] to identify a Darboux polynomial, suggesting 


^ dGL provides ways to reason about differential invariants without solving the corre- 
sponding differential equation. For example, for an invariant of the form e — 0, the 
differential invariant axiom is [[z' = f(z))]e = 0 & (e = 0 A {x = f(z))]e' = 0). 
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an initial differential invariant Do. Once we have Do, the additional information 
required to conclude post condition P is Dy — P. To get an invariant formula 
that implies Do — P, eliminate all the changing variables (z,v) in the formula 
Vz Vv (Do — P), resulting in a formula Di. D; is a differential invariant since it 
features no variable that is updated by the ODEs. Our reduction oracle returns 
Do ^ D4, an invariant that entails postcondition P. 


3.8 The CESAR Algorithm 


The CESAR algorithm for synthesizing control envelopes is summarized in Al- 
gorithm[1] It is expressed as a generator that yields a sequence of solutions with 
associated optimality guarantees. Possible guarantees include “sound” (no op- 
timality guarantee, only soundness), “k-optimal” (sound and optimal w.r.t all 
k-switching fallbacks with permanent actions), “w-optimal” (sound and opti- 
mal w.r.t all finite fallbacks with permanent actions) and “optimal” (sound and 
equivalent to S?P*). Line|11|performs the optimality test described in Sectionp.6] 
Finally, Line[10] performs an important soundness check for the cases where an 
approximation has been made along the way of computing (/",G"). In such 
cases, J is not guaranteed to be a controllable invariant and thus Case (2) of 
Def. [1| must be checked explicitly. 

When given a problem with solvable ODEs and provided with a complete QE 
implementation within reduce, CESAR is guaranteed to generate a solution in 
finite time with an “n-optimal” guarantee at least (n being the unrolling limit). 


4 Benchmarks and Evaluation 


To evaluate our approach to the Control Envelope Synthesis problem, we curate a 
benchmark suite with diverse optimal control strategies. As Table]summarizes, 
some benchmarks have non-solvable dynamics, while others require a sequence 
of clever control actions to reach an optimal solution. Some have state-dependent 
fallbacks where the current state of the system determines which action is "safer", 
and some are drawn from the literature. We highlight a couple of benchmarks 
here. See Appendix D] for a discussion of the full suite and the synthesized 
results, and [20] for the benchmark files and evaluation scripts. 

Power Station is an example where the optimal control strategy involves 
two switches, corresponding to two steps of unrolling. A power station can ei- 
ther produce power or dispense it to meet a quota, but never give out more 
than it has produced. Charging is the fallback action that is safe for all time 
after the station has dispensed enough power. However, to cover all controllable 
states, we need to switch at least two times, so that the power station has a 
chance to produce energy and then dispense it, before settling back on the safe 
fallback. Parachute is an example of a benchmark with non-solvable, hyperbolic 
dynamics. A person jumps off a plane and can make an irreversible choice to 
open their parachute. The objective is to stay within a maximum speed that is 
greater than the terminal velocity when the parachute is open. 
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Algorithm 1 CESAR: Control Envelope Synthesis via Angelic Refinements 


1: Input: a synthesis problem (as defined in Section|3.1), an unrolling limit n. 
2: Remark: valid is defined as valid(F, A) = (first(reduce(—F, A)) = false). 
3: k —0 

4: I, er + reduce([forever] safe, assum) 

5: while k € n do 

6: ec + true 

T: for each i do 

8: Gi, e + reduce([act; ; plant] J, assum) 

9: eg + eg and e 

10: if (eg and ez) or valid(I — V; Gi, assum) then 

11: if eg and optimal(/) then 

12: yield ((I, G), “optimal” ) 

13: return 

14: else if ec and er; then yield ((I, G), “k-optimal”) 
15: else yield ((1, G), *sound") 

16: I',e + reduce(I V [step] I, assum) 

I7: er + e; and e 

18: if eg and e; and valid(I’ > I, assum) then 

19: yield ((I, G), “w-optimal” ) 

20: return 

21: Ic—I' 


22: kek+1 


We implement CESAR in Scala, using Mathematica for simplification and 
quantifier elimination, and evaluate it on the benchmarks. Simplification is an 
art [2523]. We implement additional simplifiers with the Egg library [45] and 
SMT solver z3 [30]. Experiments were run on a 32GB RAM M2 MacBook Pro 
machine. CESAR execution times average over 5 runs. 

CESAR synthesis is automatic. The optimality tests were computed man- 
ually. Tablef2] summarizes the result of running CESAR. Despite a variety of 
different control challenges, CESAR is able to synthesize safe and in some cases 
also optimal safe control envelopes within a few minutes. As an extra step of val- 
idation, synthesized solutions are checked by the hybrid system theorem prover 
KeYmaera X [16]. All solutions are proved correct, with verification time as 
reported in the last column of Table[2] 


5 Related Work 


Hybrid controller synthesis has received significant attention [26]41]7], with pop- 
ular approaches using temporal logic [5/7/46], games [31]43], and CEGIS-like 
guidance from counterexamples [39]1[37T10]. CESAR, however, solves the differ- 
ent problem of synthesizing control envelopes that strive to represent not one 
but all safe controllers of a system. Generating valid solutions is not an issue (a 
trivial solution always exists that has an empty controllable set). The real chal- 
lenge is optimality which imposes a higher order constraint because it reasons 


160 Aditi Kabra, Jonathan Laurent, Stefan Mitsch, and André Platzer 


Table 2: Summary of CESAR experimental results 


: ; Non 
Benchmark S a Checking Optimal Ned Solvable 
Time (s) Time (s) Unrolling $ 
Dynamics 
ETCS Train 14 9 "4 
Sled 20 8 "4 
Intersection 49 44 "4 
Parachute 46 8 "4 
Curvebot 26 9 "4 
Coolant 49 20 "4 "4 
Corridor 20 8 "4 "4 
Power Station 26 17 "4 "4 


about the relationship between possible valid solutions, and cannot, e.g., fit in 
the CEGIS quantifier alternation pattern SV. So simply adapting existing con- 
troller synthesis techniques does not solve symbolic control envelope synthesis. 

Safety shields computed by numerical methods [211324] serve a similar func- 
tion to our control envelopes and can handle dynamical systems that are hard 
to analyze symbolically. However, they scale poorly with dimensionality and do 
not provide rigorous formal guarantees due to the need of discretizing continuous 
systems. Compared to our symbolic approach, they cannot handle unbounded 
state spaces (e.g. our infinite corridor) nor produce shields that are parametric 
in the model's parameters without hopelessly increasing dimensionality. 

On the optimality side, a systematic but manual process was used to design 
a safe European Train Control System (ETCS) and justify it as optimal with re- 
spect to specific train criteria [34]. Our work provides the formal argument filling 
the gap between such case-specific criteria and end-to-end optimality. CESAR 
is more general and automatic. 


6 Conclusion 


This paper presents the CESAR algorithm for Control Envelope Synthesis via 
Angelic Refinements. It is the first approach to automatically synthesize symbolic 
control envelopes for hybrid systems. The synthesis problem and optimal solu- 
tion are characterized in differential game logic. Through successive refinements, 
the optimal solution in game logic is translated into a controllable invariant and 
control conditions. The translation preserves safety. For the many cases where 
refinement additionally preserves optimality, an algorithm to test optimality of 
the result post translation is presented. The synthesis experiments on a bench- 
mark suite of diverse control problems demonstrate CESAR’s versatility. For 
future work, we plan to extend to additional control shapes, and to exploit the 
synthesized safe control envelopes for reinforcement learning. 
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or format, as long as you give appropriate credit to the original author(s) and the 
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Abstract. For developing safe automated systems, recognizing safety- 
critical situations in data from their complex operational domain is im- 
perative. This capability is, for example, essential when evaluating the 
system's conformance to specified requirements in test run data. The 
requirements involve a temporal dimension, as the system operates over 
time. Moreover, the generated data are usually relational and require 
additional background knowledge about the domain for correctly recog- 
nizing the situation. T'his fact makes propositional temporal logics, an 
established tool, unsuitable for the task. We address this issue by de- 
veloping a tailored temporal logic to query for situations in relational 
data over complex domains. Our language combines mission-time lin- 
ear temporal logic with conjunctive queries to access time-stamped data 
with background knowledge formulated in an expressive description logic. 
Currently, however, no tools exist for answering queries in such settings. 
We hence also contribute an implementation in the logic reasoner OPEN- 
LLET, leveraging the efficacy of well-established conjunctive query an- 
swering. Moreover, we present a benchmark generator in the setting of 
automated driving and demonstrate that our tool performs well when 
tasked with recognizing safety-critical situations in road traffic. 


Keywords: Temporal Conjunctive Queries - Description Logics - Tem- 
poral Logics. 
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1 Introduction 


Recent technological advances in, e.g., sensors and computer vision, gave up- 
draft to the development of automated systems performing safety-critical tasks 
in complex domains. These systems are expected to safely operate without hu- 
man intervention in these contexts. Consider, for example, automated driving 
systems (ADSs), where the responsibility of safely navigating the environment 
lies fully with the system [35]. The combination of their safety-critical nature 
and the complex operational domain makes it hard to guarantee the absence 
of unreasonable risk before public release, which is, however, required by many 
homologation authorities. Alas, correct-by-design techniques are rendered in- 
applicable by the high system complexity. Thus, manufactures must resort to 
empirically assessing the system's risk prior to deployment. As automated sys- 
tems interact with their environment over time, a promising approach for risk 
assessment is to decompose the complex operational domain into finite-time se- 
quences ("scenarios") [34]. Safety requirements — aiming to mitigate unreasonable 
risks — are then specified for these scenarios. Hence, a formal specification of the 
actors’ temporal behavior becomes essential. An exemplary requirement reads 
as follows: 'In situations where the absence of pedestrians is not guaranteed, 
adapt the speed appropriately. Note that this rule consists of a premise (the 
situation) and a consequence (the behavior). The number of situations to write 
requirements for can be enormous, e.g., occlusions [42], violating the safety dis- 
tance [43], and maneuvers such as passing parking vehicles [11]. Due to their 
large number, testing the most widely used option for verification, i.e., to check 
the system's conformance with requirements. For this, data of test runs of the 
system operating within its environment are recorded. Adherence to the require- 
ments is then evaluated by recognizing the situation ("no guaranteed absence of 
pedestrians’) and testing for the implied behavior ('adapted speed’). We argue 
that this approach has three requirements: 


Relational and Temporal Domain Formally modelling traffic situations in- 
herently requires a relational language since they refer to individuals and 
their relationships, e.g., drives. Moreover, the number of individuals is not 
fixed beforehand. Finally, scenarios over such situations involve the descrip- 
tion of temporal aspects. A typical example is the process of overtaking. 

Rich Background Knowledge We do not assume that the data is complete 
in the sense that we can observe all facts about all individuals. Instead, 
we assume to have rich knowledge about the relations used in the situation 
descriptions. Examples for this are: 

— a Driver is equivalent to a Human which drives some Vehicle, or 

— a Driver is never a Pedestrian. 
Such knowledge must be included since otherwise situations may not be 
correctly recognized in the data and test evaluation produces false results. 

Formal Specifications of Properties It is established that specifying and 
testing requirements benefits greatly from formal approaches. Standard re- 
quirement formalization languages, like linear temporal logic, are however 
propositional and thus unsuitable for our purposes. 
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An established way to address the first two aspects is to model situations via tem- 
poral knowledge bases K = (O, D) which consist of a domain ontology Ó that de- 
scribes the background knowledge and a temporal database D that describes the 
evolution of the situation over time. Formally, D is a sequence D = (Do,..., Pn) 
of time-stamped databases. Note that, in using temporal knowledge bases, we 
adopt the open world assumption (OWA), which intuitively says that the true 
facts are not only those in D but those that are entailed by O and D. 

As to address the third aspect above, i.e., to formally specify properties, we 
use a suitable extension of linear temporal logic (LIL). Recall that LTL is a 
language for describing properties over a set of propositions by using modalities 
such as Ow (i holds eventually), Oy (p holds globally), qv102 (pı holds until 
3), and Og (y holds in the nezt step). Unfortunately, this does not suffice when 
working over relational data. A natural option to extend LTL in the required way 
is to replace propositions by queries. In this work, we use conjunctive queries 
(CQs). CQs are one of the most common query language for databases and 
expressively equivalent to the SELECT-FROM-WHERE fragment of SQL. For 
example, we can ask for all drivers d of a vehicle by the CQ dv.Vehicle(v) ^ 
drives(d,v) with one existentially quantified variable v and one answer variable 
d. In terms of the temporal expressivity, our application further requires that 


(1) we operate on finite traces whose length is bounded by the length of the 
temporal database D specified in the temporal knowledge base, 

(2) as duration constraints are used in specifications, e.g., to distinguish maneu- 
vers of certain lengths, we incorporate metric operators, and 

(3) we analyze the data a-posteriori. Hence we are not in a run-time verification 
setting and require only future time operators. 


We term the resulting language metric temporal conjunctive queries (MT CQs), 
which features both unbounded and bounded future time operators over finite 
traces and uses CQs in its atoms and is based on Mission- Time LTL (MLTL) [29]. 
MTCQs can, for example, express properties like 86” (x) = O-Pedestrian(r), 
asking for all individuals x that are eventually not a pedestrian. A more involved 
MTCQ asking for all z that move past a parking vehicle y on a two-lane road is 


P (x,y) = O(ar.Vehicle(x) ^2. Lane Road(r) ^ intersects(r,x)A 
Parking Vehicle(y)) ^O(in front of(y,z)^O 


((in proximity(r,y)^ to the side of(y,x))Ubehind(y, x))). 


Recognizing such a situation for checking a requirement translates to the task 
of evaluating an MTCQ (7) with answer variables z over a temporal knowledge 
base K. Informally, if we want to verify that a tuple of individuals @ conforms 
to some specification (Z) in a situation K = (O, D), we have to check whether 
the entailment (O, D) E- (d) is true, cf. Section 3 for precise definitions. 

This task obviously depends on the chosen ontology language. For this, we 
use description logics (DLs), an established knowledge representation formalism, 
which offers a good compromise between complexity and expressivity [10]. Our 
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approach works up to the SRT Qu fragment of DLs. It is close to the formalism 
behind the Web Ontology Language (OWL) 2, an expressive and widespread DL 
language. The mentioned task of entailment has been studied for DL temporal 
knowledge bases and a different yet related extension of LTL [8], cf. Section 2. 

We now illustrate this setup by means of a simple example. A DL ontology 
O is a set of concept inclusions C E D for concept descriptions C and D. We also 
write C = D to denote concept equivalence. DLs allow for arbitrary names as 
basic concepts. We have special names for nothing (L) and all things (T). Besides 
concepts, DLs also allow so-called roles (relations) between concepts. From these, 
we can inductively build new concepts. For an example ontology O°", we might 
state that every driver is a human by Driver C Human € O°. As to illustrate the 
combination of roles and concepts we define drivers as the intersection (using the 
F'-operator) of all humans and all things that drive some (using the 3-operator) 
vehicle, written as Driver = Human[|ddrives.Vehicle € O°. We can use L to 
express that pedestrians and drivers are disjoint: Driver M Pedestrian E L € 
OF: 

These operators may be enough for simple domains. However, knowledge 
about relations in complex domains is often involved, in which case even more 
expressive operators can be allowed. For example, the MTCQ f” requires recog- 
nizing situations of passing parking vehicles. Here, expressive DLs allow modeling 
two-lane roads to have exactly two lanes (by the concept —2has 1lane.Lane) 
and be a road (by the concept Road =2has_lane.Lane). Moreover, parking 
vehicles are standing (with a speed of the datatype literal 0.0) dynamical objects 
on a parking spot. This is expressed by the following DL ontology: 


— 2 Lane Road = Road[|—2has lane.Lane 

— Vehicle[lStanding Dynamical Object[ldintersects.Parking Spot C 
Parking Vehicle 

Parking Spot — Parking LanellWalkway 

— Standing Dynamical Object =Dynamical Object dhas_speed.{0.0} 


Let us now use the simple example to give an intuition on the semantics of 
MTCQs over DL ontologies. First, we create an exemplary database with facts 
over so-called individuals (concrete objects that are perceived). For example, we 
can assert for the first time point that the individual h is a human driving the 
individual v, a vehicle, by writing the facts as D$? = {Human(h), drives(h, v), 
Vehicle(v)}. Next, we may perceive Df” = 9, i.e., no information at all. Together 
with the ontology, it forms a temporal knowledge base K** = (O°, (D§", D{*)). 
If we query K** w.r.t. DE” (x) = O-Pedestrian(x), we get h as the only answer, 
as h is a driver in D5” and the ontology states that drivers can never be pedestri- 
ans. However, if we change the query to 6$" = L]-2Pedestrian(r), no individual 
satisfies the constraint, since D(^ asserts nothing — it can very well be possible 
that h has become a pedestrian (due to the OWA). 

This example highlights that languages like MTCQs are important for testing 
requirements on systems in complex domains. However, up to now, only the 
theoretical work by Baader et al. examines a related but hard-to-implement 
setting over infinite traces for complexity-theoretic analyses [8]. No language 
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has yet been defined that is practically suitable for implementation and has the 
required expressiveness. Moreover, there currently is no tooling for any temporal 
query language over expressive DLs. Our work on MTCQs addresses this gap. 


For this, we first introduce the formal foundation of MTCQs in Section 3. 
We implement the framework in an answering engine for a large and practically 
relevant subclass of MTCQs in Section 4, closing the identified research gap. To 
evaluate its efficacy, we present a benchmark generator for temporal knowledge 
bases, as described in Section 5. We show the efficacy of our tool in this practical 
setting in Section 6. To summarize, the main contributions of our work are 


1. MTCQs as a practically implementable and expressive temporal query lan- 
guage and the first tool for answering such queries up to the DL SRT QP), 


2. a benchmark generator for the evaluation of inference tasks on temporal 
knowledge bases, and 


3. an application of the tool in our motivational setting of situation recognition 
for urban automated driving. 


2 Related Work 


We previously claimed that for our motivational domain of ADS development 
the usefulness of temporal logics (TLs) and related mechanisms - e.g., regular ex- 
pressions — for scenario extraction has been recognized, which is supported by the 
literature [26, 31, 18, 16]. More specifically, work exists in specifying behavioral 
requirements, e.g., based on traffic rules, using TLs [1, 33, 19]. However, none of 
these approaches formally incorporate an ontology. In general, the importance of 
ontologies in automated driving is recognized, see, e.g., ASAM OpenXOntology 
[7] for an international standardization project as well as Westhofen et al. [42] 
and Zipfl et al. [44] for non-systematic reviews. Some ontological approaches 
are in fact based on DLs [27]. However, we are not aware of work within the 
automotive domain that uses DLs and TLs for analyzing temporal traffic data. 


On the theoretical side, a plethora of temporal DLs have been introduced [2, 
32, 5], also on finite traces [6]. These classical combinations were not conceived in 
a query answering context, so more recently, several frameworks for addressing 
that have been introduced [3]. We mention the most important ones here. There 
is work on ontologies formulated in the lightweight (i.e., comparatively inexpres- 
sive) DLs DL-Lite [12,38] and E£ [13,22]. For expressive DLs, an important 
line of work theoretically examines answering temporal conjunctive queries — es- 
sentially infinite-time LTL over conjunctive queries — over temporal knowledge 
bases with the ontology language ranging from ALC [8] to SHQ [30,9]. Related, 
but orthogonal to combinations of DLs with TLs, are combinations of Datalog 
with TLs. This line of research started around 1990 with DataloglS [15], and 
lead to other combinations [14, 39] for which also tools exist [40]. 
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3 Formal Foundations 


We introduce the formal foundations of the relevant DLs and their temporal 
extension. For the sake of simplicity, we focus on the ontology language ALC, 
which is a prototypical language in the class of expressive DLs. However, our 
approach generalizes to (and is actually implemented for) the more expressive 
logic SRI QU, cf. Horrocks et al. for further reference on this DL fragment [24]. 

We start with an introduction to non-temporal knowledge bases which we 
later use as a foundation for defining the temporal case. As sketched in Section 1, 
in ALC we can describe the relationship of roles and concepts in an ontology O 
and assert individuals to these concepts and roles in a database D. Any knowl- 
edge base is thus a tuple (O, D) and relies upon concept, role, and individual 
names. For the remainder, we fix countably infinite supplies Nc, Ng, N, of con- 
cept, role, and individual names, respectively. An .AZCC-concept description C is 
formed according to C :— A| 5C | CfT1C| CUI C | Vr.C | Jr.C where A ranges over 
Nc and r ranges over Nr. We can thus compose new concepts using negation, 
intersection, and union. For a role r, we moreover allow for universal (enforcing a 
concept to only have r-successors in C) and existential quantification (enforcing a 
concept to have an r-successor in C). Section 1 already introduced an example of 
an existentially quantified role using Jdrives.Vehicle - the concept of all things 
driving some vehicle. An ontology is a set of concept inclusions C C D for ALC- 
concepts C and D, denoting subsumption of the concept C to the concept D. We 
write C = D (concept equivalence) for C E D and D E C. Again, the introduction 
used Human[lddrives.Vehicle = Driver as an example for concept equivalence. 
The data is a set of facts of the form A(a) and r(a, b) for a,b € Ni,r € Ng, and 
A € Nc, hence assigning individuals to concepts and roles. We denote the set of 
individuals that occur in D by Ind(D). The introductory example of Section 1 
used the set of individuals (h, v} and asserted the role drives(h, v). 

The semantics of ontologies and data is defined via interpretations Z = 
(A*,-7) of a domain A? and a mapping 7 that assigns a set A? C AT to 
every A € Nc, a binary relation r^ C A? x A? to every role name r € Ng, 
and an element a? € A? to every a € Ni [10, Chapter 2.2]. As to incorporate 
ALC-concept descriptions, the interpretation function is inductively defined as: 


Vr. 


LLI 
H 


) 
) 
(Vr.C)* := {c € At | Vd € AT. (c, d) eR?’ dec} 
(3r.C)* = {c € AT | 3d e AT. (e, d) eR Ad ect?) 

Then, we say Z - C E D if C? C D7, Z E- A(a) if a? € A7, and Z | r(a,b) if 
(a7, v7) € x7. As to lift these definitions to ontologies and data, we write Z = O 
and Z E D if O resp. D satisfy all concept inclusions in Ó resp. assertions in 
D. Finally, for a complete knowledge base, we define Z E- (O, D) if Z E- O and 
IED. More details on the semantics of DLs are given by Baader et al. [10]. 
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We now extend this definition of non-temporal knowledge bases to the tempo- 
ral case, where a knowledge base consists of an ontology O and a finite sequence 
of assertions that describe the databases over time. 


Definition 1 (Temporal Knowledge Base). A temporal knowledge base 
(KB) is a tuple K = (O, (Di)ie{o,....n}) where O is an ontology and each D; 
is a, database. 


'Their semantics is defined by temporal interpretations using the non-temporal 
case as its basis. 


Definition 2 (Temporal Interpretation). A temporal interpretation J is a 
finite sequence J = (T;);et0,...,) Of interpretations over a fixed domain A such 
that a? = ati, for alla € Ny and 0 < i,j < m. We call J a model of the temporal 
KB (O, (Di)icto,..,n)), written IE K, if m = n and T; = D; and T; = O, for 
all i € {0,...,n}. 


The assumption that all interpretations share a common domain is called con- 
stant domain assumption. We define next the language MTCQ that we use to 
query temporal KBs. It is a combination of standard conjunctive queries with 
temporal operators inspired by MLTL [29]. 


Definition 3 (Syntax of MTCQs). Let Ny be a countably infinite set of vari- 
able names. A conjunctive query (CQ) y is an expression of the form p(z) = 
3g. (x, y) where x,i are tuples of variables from Ny and w is a conjunction of 
concept atoms A(t) and role atoms r(t,t') with A € Nc, r € Ng, and t,t € 
XUgU Ni. Metric temporal conjunctive queries (MTCQs) 9 are built from CQs 
using negation ^9, conjunction 9 ^ $', and two versions of until, UP’ and 
QU, d for a,b € N. We denote with Ind(®) the set of individuals and Var(®) 
the set of variables in an MTCQ 4. 


Note that we extend MLTL by borrowing the unconstrained until operator from 
LTL, because it is a frequent operator in practice. Additionally, it allows for 
a more direct translation to finite automata in our system presented later on. 
We call the variables z the answer variables and j the quantified variables. An 
MTCQ is Boolean if it does not have answer variables. The semantics of Boolean 
CQs is defined in terms of matches into interpretations. 


Definition 4 (Semantics of Boolean CQs). For a Boolean conjunctive query 
p and an interpretation T, T = ọ iff there exists a function n: Var(p)UInd(y) > 
AT with 1. n(a) = a? for all a € Ind(y), 2. x(t) € C* for all C(t) in y, and 
3. (n(t), n(t^)) € r? for all x(t,t') in o. 


Hence, an interpretation satisfies a Boolean CQ if the interpretation can respect 
its constraints. Boolean CQs form the basis for the semantics of Boolean MTCQs. 


Definition 5 (Semantics of Boolean MTCQs). Let 3 = (Ti)ie{o,...m} be a 
temporal interpretation and i € (0,..., m]. The semantics of Boolean MTCQs 
is given by structural induction: 


— J, i = 9 iff Ti E P, if ® is a Boolean CQ; 
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— J i H| 76 iff J, i lb ds 

= Ji E pı A2 iff 5,7 = 2, and 3,i E Po; 

— J, i H Bı Uap) Po iff there is ak € [a,b] withi+k € m such that J, i--k = 82 
and 3,i+ j = 41, for all j € |a, k); 

— J, i E UG, iff there is ak € [i,m] such that J, k = $9 and J, j | Bı, for 
all j € [i i + k). 


We allow the typical abbreviations ® V P for (4 ^ —4"), false for 3x. A(x) ^ 
—^d«.A(x) for some A € Nc, true for —false, Ot, 56 for trueUja yj, OF for true, 
[a,)]O for ^O, 79, and OP for 20-0. The strong next-operator is defined as 
Ob = On? and weak next as eo = [1,1]9. Note that finite trace semantics 
exhibit some non-obvious behaviors, e.g., QP is equivalent to OS [17]. 

A central problem over Boolean MTCQs is entailment: For a temporal KB 
K and an MTCQ 4, we say K |= @ if for all temporal interpretations J with 
J EK also 2,0 H 9 holds. For example, for K** from Section 1, it holds that 
K** E O-Pedestrian(h) as for any temporal interpretation J = (Zo, Z1) with 
To = O** and To = DE”, it must also hold that h7° € (2Pedestrian)/? due to 
the fact that h is inferred to be a driver and thus cannot be a pedestrian. 

We remark that this semantics is closely related to the one over temporal 
conjunctive queries (TCQs) introduced by Baader et al. [8] to query temporal 
KBs over arbitrary models, i.e., not restricted to mission time. In fact, it is not 
difficult to see that entailment K — & for Boolean MTCQs 6 can be reduced to 
deciding whether K entails @ in the sense of Baader et al. [8] for some TCQ $ 
that can be computed in polynomial time from $4; we denote the latter entailment 
relation with K BBE $, In the mentioned paper it is also shown that the latter 
entailment problem is in ExpTime. Together with the ExpTime-lower bound for 
subsumption in ALC this shows that MTCQ entailment is ExpTime-complete. 
Of course, the complexity is potentially higher for ontology languages beyond 
ALC. Finally, if in place of CQs in MTCQs we allow for .ACC-concepts, the 
resulting language can be embedded into the metric temporal DLs discussed by 
Gutiérrez-Basulto et al. [23]. 

While Boolean MTCQ entailment is the natural problem to consider for com- 
plexity analysis, a practical system needs support for answering non-Boolean 
MTCQs, which is defined based on entailment. Let K = (O, (Di)ie{o,...,n}) be a 
temporal KB, (7) an MTCQ with answer variables 7, and @ a tuple of indi- 
viduals from K, i.e., à C Ind(K)) :— U;=0,...n Ind(D;). We call d a certain answer 
to B(T) over K if K [- é(a). Here, d(d) is the uniform replacement of the vari- 
ables in by the individual names in d, leading to a Boolean MTCQ. Our main 
reasoning task is to compute the set certic() of certain answers of ® over K. 
Section 1 gives an example for this set: certc« (Q^Pedestrian(x)) = {h}. 


4 Computing Certain Answers in Practice 


We start with noting that to compute certi: (4), it is not sufficient to answer all 
of &’s CQs at time i and combine them inductively according to the semantics 
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due to the presence of disjunction in our query language. An example is the 
MTCQ 4$, (x) := B(x) V C(x) over the temporal KB Ky = (A C BUC, (A(a))), 
where certc, (Py) = {(a)}. A separate check of B(x) and C(x) returns no answer, 
and inductive combination falsely yields no answer as well. T'his issue explains 
the restriction to conjunctions in existing CQ answering implementations over 
expressive DLs, as complexity is reduced and various optimizations can be em- 
ployed. Therefore, and in contrast to both LTL; over propositional atoms and 
CQ answering, we require a more involved procedure for checking MTCQs. 

The correct but naive way to compute certi (9) is to enumerate all candidate 
answers d C Ind(K) and decide whether K PPE (d) via the algorithms pro- 
vided by Baader et al. [8] (for the temporal aspects) and Horrocks and Tessaris 
[25] (for answering disjunctions of conjunctive queries). This, however, suffers 
from several problems. First, there are potentially many answer candidates since 
the number of relevant tuples is exponential in the arity of the query ®. Sec- 
ond, while the mentioned algorithm for deciding PP! is useful for a complexity 
analysis, it does not lend itself to a direct implementation. Finally, the algo- 
rithm of Baader et al. works over unrestricted models and is thus more difficult 
to implement. This section provides the foundations for the algorithm that we 
implemented in our tool and the central improvements needed to make it work 
in practice. 

As MTCQs are closed under negation, entailment is just the complement of 
satisfiability: a Boolean MTCQ @ is satisfiable w.r.t. a temporal KB K if there 
is a model J of K with 3,0 E & As K E 6 iff ^9 is unsatisfiable w.r.t. K, we 
can, for the sake of convenience, focus on satisfiability in the following. 

We need some preliminary notions. Given an MTCQ 6 (possibly with answer 
variables), we denote with CQ(@) the set of all CQs in 9. The propositional ab- 
straction PA(®) of 9 is the replacement of each v € CQ(®) with a propositional 
variable p,. Note that the propositional abstraction of an MTCQ is an MLTL 
formula potentially with an unconstrained until, which is the underlying tempo- 
ral formalism. This TL is interpreted over finite words Py- -- P, where each P; 
specifies the propositional variables that are satisfied at time point i. Boolean 
operators are interpreted as usual and temporal operators U and Uj, yj are inter- 
preted in line with Definition 5. The following characterization of satisfiability 
is easy to prove from the definitions. 


Lemma 1. For a Boolean MTCQ © and a temporal KB K = (O, (Di)ieto,....n3); 
® is satisfiable w.r.t. K iff there is a sequence Xo,..., X4 of subsets of CQ(®) 
such that: 


1. there are interpretations 19,..., T, over the same domain such that, for all 
i € (0,...,n]), we have T; = O, Ti = Di, and Ti E- o for every p € Xi, and 
Ti Ay dor every p € CQ(9) \ X;, and 
2. {pe | e € Xit icto... n} satisfies PA(9). 


Intuitively, Lemma 1 splits the problem of deciding MTCQ satisfiability into sep- 
arate DL and TL tasks which are only connected by the sets of CQs Xo,..., Xn. 
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Lemma 1 can be further refined as follows. The requirement that all inter- 
pretations Z9,...,Z, be over the same domain can be dropped without com- 
promising correctness. Indeed, we can combine Zp,...,Z, witnessing Point 1 in 
Lemma 1 but with potentially different domains into Zj,...,Z/, with the same 
domain using a standard argument, cf. the proof of Theorem 5.21 by Lippmann 
[30]: Since ALC cannot enforce finite models, we can assume that each Z; is infi- 
nite. By the downward Lówenheim-Skolem-Theorem, we can assume that the Z; 
are countably infinite and thus have the same domain. It remains to identify the 
interpretation of the individual names. Note that the argument goes through for 
more expressive logics such as SRT QP), 


Lemma 2. Lemma 1 remains valid when “over the same domain” is dropped 
from Point 1. 


Hence, the checks at each time in (the modified) Point 1 are independent. It 
remains to show how we can implement the check of Point 1, which includes 
negated CQs. By the natural connection between satisfiability and entailment, we 
can leverage an engine for answering disjunctions of CQs over non-temporal ALC 
KBs for this, i.e., computing cert (9) for K = (O, D) and 9 a disjunction of CQs. 
For doing so, we associate with every Boolean CQ y its canonical database Dy 
which is just the set of all conjuncts that occur in y. (For the sake of simplicity, 
we allow variable names from q as individual names in Dg.) We then exploit the 
following observation. 


Observation 1 Let X be a set of Boolean C Qs, let O be an ALC-ontology and 
D the data. Then the following are equivalent for every subset Z C X: 


(a) There is a model T of O and D such that T = ọ for every e € Z, and T Ky 
for every o € X NZ. 

(b) (O,D') E Voex\z v where D' is the union of D with D, for each p € Z 
(with variables across different D, suitably renamed). 


Thus, to check the modified Point 1 for some time point (a condition of shape (a) 
in the above Lemma), we can check its reformulation as (b) using a (non- 
temporal) query engine for disjunctions of CQs. As demonstrated by the exem- 
plary query y(x), this is, however, more involved than answering each disjunc- 
tion separately, a problem already known to the DL community. For correctly 
answering such disjunctions of CQs, we require a reformulation in of the disjunc- 
tion into conjunctive normal form, and then answer each conjunct separately as 
described by Horrocks et al. [25]. For P C (p, | e € CQ(#)}, we define VALS (P) 
as true iff. O, D = Di, Z := {p | py € P}, X :— CQ(®) pass the test in Point (b), 
and thus the modified Point 1. 

'To implement Point 2, we exploit that for each MLTL formula x over some 
set of propositions X, one can compute an equivalent LTL; (LTL over finite 
traces) formula x’ over X [29] which in turn can be transformed into a finite 
automaton (FA) A, over 27 which recognizes precisely the models of x’ and 
thus of x [17]. Both these transformations are not polynomial and there is, in 
general, no efficient conversion of an MLTL formula to an FA. However, since 
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DA(z) ^ Pr (x,y) DA(z) true 


DA(a) ^ Pr(x,y) 


Fig. 1. FA for PA(-(LIA(z) ^ Or(z, y))). 


Algorithm 1 Computing certain answers to MT CQs. 
Input: MTCQ ó(z), temporal KB K = (O, (D)ie(o,...,n3) 
Output: cert, (#). 


1: 9:— Construct_ FA(PA(-^49)); 

2: // states Q, initial state qo, final states F, transitions A 
3: C := ind(D)* where k = |z| 

4: Initialize S(@,0) :— {qo} forallac C 

5: for i:— 1 to n+ 1 do 

6: forá c C do 

T: S(d,i) :— 0 

8: for q € S(d,i — 1) do 

9: S(à,i) = S(à,i) U (d | (q, X, q') € A, VAL (X)} 
10: end for 

11: end for 

12: end for 


13: return (€ C | S(G,n - 1) n F =9}; 


queries are often small in practice, this is still feasible. For example, the minimal 
FA for pi U<aO<pp2 has a+b+3 states. Figure 1 shows the FA for answering 
the simple MTCQ ©,,(z,y) = DA(x) ^ Or(z, y). Note that the transitions are 
labeled with Boolean formulas over the propositions indicating a transition for 
each model of the formula, which can be exponentially more succinct. 


What was said so far suggests the basic procedure for computing certic(d) 
that is depicted in Algorithm 1. It considers for each answer candidate d all pos- 
sible ’runs’ Xo,..., Xn in a step-by-step fashion and checks (modified) Points 1 
and 2 after each step; the set S(d, i) contains all states the FA corresponding to 
—ó(d) can reach after i steps. The central test happens in Line 7 and is given 
here for the direct encoding of the transitions; it can easily be adapted for the 
mentioned succinct encoding. The algorithm returns all d for which no final state 
is reachable after n + 1 steps. Applied to the example FA in Figure 1 and a can- 
didate answer (a,b) this means that the FA ends up in state q in all possible 
runs, according to the temporal KB. The only way to achieve this is for the FA 
to not stay in qo or qo. For this, it has to eventually change from qo to qı by 
having neither A(a) A ^r(a,b) nor ^A(a) but A(a) A r(a, b) satisfiable. The FA 
shall then stay solely in qı with only A(a) satisfiable for the remainder. Clearly, 
in this case (a, b) is a certain answer. 
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4.1 Improvements 


Some standard improvements over Algorithm 1 are applicable, e.g., to work 
directly on a minimal FA. However, this does not yet address the problem of 
the many answer candidates to consider, of which, in practice, only few will be 
entailed. Algorithm 1 considers each candidate individually, which is inefficient 
since similar tasks are repeatedly executed. We instead leverage existing systems 
that implement efficient algorithms specifically tailored towards answering CQs 
over standard (non-temporal) KBs. As an example, consider again the FA in 
Figure 1. Observe that q2 € S(d,i) for all d, i for which (O,D; i) Æ A(@). 
Indeed, —A(d) is satisfiable w.r.t. (O, D; 1), for those d, i. Since q2 is a sink, this 
allows us to instantly reject all non-answers to A(x). We now generalize this to 
extract certain (non-)answers by answering the CQs occurring in the edges. 


The main idea is to perform an under-approximating traversal of the FA 
prior to Algorithm 1. More concretely, we use CQ answering to construct sets 
R(d,i) C S(d,i) and U(d,i) C QN S(ad, 4) that under-approximate the reachable 
and unreachable states, respectively, for a candidate d at time i. This serves two 
purposes. First, we can already extract some certain answers from U and some 
certain non-answers from R, namely the sets (d € C | U(d,n +1) 2 F} and 
(à € C | R(d,n + 1) C F}, respectively. These candidates are not considered 
anymore during the run of Algorithm 1. Second, we are able to re-use cached 
answers to CQs in the first traversal during Algorithm 1. 


We now describe how to construct the sets R and U during FA traversal. 
R(G, 0) is initialized as {qo} and U (d, 0) is initialized as QV {qo}, for all d. For the 
update step with 2 > 0, we assume for all states gz, q to have succinctly encoded 
edges a, :— ape ^ Py for some sets Po, P, C P, as already used 
in Figure 1. wi hd igs in the FA at time i, we use a CQ engine 
on K; := (O,D;) to compute certi, (v) for all e € {Y | py € Po} U (A, ep, Y} 
From these sets, we are able to extract information on the relevant queries: 


1. for all d ¢ certc, (p): ^(d) is satisfiable w.r.t. Ki; 
2. for all d € certc, (p): y(@) is satisfiable and —y(d) is unsatisfiable w.r.t. Ki. 


We transfer this knowledge about the (un-)satisfiability of y(@) and —w(a) to 
the edges o. Satisfiability knowledge is transferable if qu; € R(d,4 — 1) and 
Ok] = Py resp. oy, = Py. We then add qı to R(d, i). Unsatisfiability knowledge 
on —y(@) is transferable if a, contains ^p,. Adding unsatisfiability knowledge 
to U requires adaptations. Firstly, we can only add q to U(d,i) if all other 
edges aj, to q also agree on unsatisfiability of d at time i, i.e., they contain 
some —p, for which o'(d) is known to be unsatisfiable or q; € U(a,i — 1). 
Secondly, unsatisfiability generates new satisfiability information: for a state qk 
with successors q,,.--, qı, we know that (qi....,qi, ,] C U(d, i) implies qı, € 
R(d,i). Together with the described acceptance condition, the sets R(d,n + 1) 
and U(d, n + 1) deliver an under-approximation of the certain (non-)answers. 
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4.2 Our System 


We implemented this approach as a module in the DL reasoner OPENLLET [37]. 
The implementation is available at https://github.com/lu-w/topllet. Our 
module does not support full MTCQs yet. Instead of allowing arbitrary CQs as 
atoms, we allow the subclass tCQ of CQ which consists of all CQs y s.t. in the 
graph Gy = (V, E) with V = Var(y) U Ind(y) and E = ((t,r,t) | r(t,t) € v) 
each vertex has at most one incoming edge and, if interpreted undirectedly, G 
is acyclic, i.e., the query graph is tree-shaped*. 

We denote with tMTCQ the subclass of MTCQ where each CQ is in fact a 
tCQ. The reasons for considering this query class are two-fold. First, most queries 
that occur in practice are tUTCQs. Second, tCQ answering can be implemented 
by a straightforward procedure of ’rolling-up’ the query graph [25]. Therefore, 
OPENLLET already provides an tCQ-answering engine over SROTQ™) KBs, 
implementing many optimizations [36]. Moreover, the procedure can be adapted 
to answering disjunctions of tCQs as described by Horrocks and Tessaris [25], 
which required for our algorithm, cf. Point (b) in Observation 1. 

As a first necessary step, we thus extended OPENLLET to being able to an- 
swer disjunctions of tCQs. For the construction of the FA, we implemented the 
conversion of MLTL to LTL ; described by Li et al. [29]: essentially, the intervals 
in Uja,» are encoded using sequences of the next-operator O of length a and b, 
respectively. We then rely on LYDIA, which converts LTLy formulas to equivalent 
deterministic FA [20]. We extend and use the AUTOMATALIB [28] to access the 
resulting FA. We provide a test suite for our system to highlight correctness of 
the implemented algorithms. 


5 Benchmarks 


Our CQ answering approach motivates the need for empirical evaluation, for 
which ideally controlled real-world data is used. In fact, for one experiment, we 
obtained drone data from an intersection in Germany. These data turned out to 
be insufficient for a thorough evaluation, as they are proprietary and not scalable. 
This calls for synthetic yet realistic benchmark data that can be randomized, 
scaled in size, and are freely available for replicability. However, we are currently 
not aware of any public benchmark data on querying temporal KBs. The same 
was noted by the developers of METEOR, where data of the Lehigh University 
Benchmark [21] are extended with random intervals to enable an evaluation on 
the OWL RL fragment of LUBM. Unfortunately, a random extension of a non- 
temporal benchmark might not reflect actual temporal data, e.g., in continuity of 
concepts over time, and thus might not transfer to real-world applications. As our 
final contribution, we hence present the Traffic Ontology Benchmark (TOBM), a 
benchmark generator for scenarios of automated driving applications that mimics 


^ This constraint allows us to perform the rolling-up procedure on the BCQs of the 
FA. However, it is actually just a sufficient condition for rolling-up. More precisely, 
we require the FA to contain only BCQs where each negated query is a tCQ. 
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Fig. 2. A scene of the T-crossing scenario sampled from TOBM. 


real-world data and enables to evaluate tools on temporal KBs, including MTCQ 
answering. The tool is available at https://github.com/lu-w/tobm. 

For the ontology we rely on the publicly available Automotive Urban Traffic 
Ontology (A.U.T.O.) [42, Section 5]. It is a conglomerate of SRZQ™? ontologies 
for the traffic domain and related fields, and currently consists of 1449 axioms 
over 676 concepts and 213 roles. A.U.T.O. was already successfully used for 
analyzing real-world traffic data from drone recordings [42, Section 8]. 

The benchmark generator creates temporal data for A.U.T.O. with individu- 
als scaling linear to some N > 0. A seed S can be used for pseudo-randomization. 
From both parameters, it generates scenarios of a certain length (by default, 20 
seconds). These can be sampled from two settings: 


1. A T-crossing setup with parking vehicles, a pedestrian crossing, bikeway 
lanes, pedestrians, bicyclists, and passenger cars (cf. Figure 2). It has 8-N+22 
individuals. 

2. An X-crossing of two urban roads with traffic signs and dysfunctional traffic 
lights. Compared to the T-crossing, there are no bicyclists and 5- N + 69 
individuals. 


The scenarios are created based on behavior models for pedestrians, bicy- 
clists, and passenger cars. Passenger cars and bicyclists drive up to a speed limit 
if their front area is free, otherwise they use a following mode. Vehicles yield on a 
predicted intersecting path. Moreover, a random successor lane is selected when 
turning at intersections, giving a turning signal with a probability of 3% each 
time point. Pedestrians follow their walkway, but can randomly initiate road 
crossing with a probability of 0.7%. We give a visualization of two exemplary 
scenarios can be found in the linked repository. 

Our implementation models temporal KBs as a list of OWL2-files for the 
data, each importing a shared ontology. Geometrical data are abstracted to 
spatial predicates (e.g., is in front of) in a pre-processing step. For S = 0, 
N = 3, and 20 seconds sampled with 10 Hertz on the T-crossing setting, this 
results in a data sequence with 46 individuals and 647847 assertions in total 
(approx. 3239 per time point) with constant assertions only counted once. 
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6 Evaluation 


We now examine practical feasibility of our system by an evaluation on TOBM, 
answering the following questions: 


1. Is the approach applicable to practical, a-posteriori situation recognition 
tasks (such as evaluating test data) with larger numbers of assertions? 

2. What is the impact of our improvement of leveraging CQ answering on 
overall applicability? 

3. In practical settings, how much satisfiability knowledge can be generated by 
CQ answering? 


As inputs, we sampled TOBM with S = 0 and N € {1,...,5} for both the 
X- and T-crossing. We fix a 20 second duration with ten Hertz, as our algorithm 
performs linear in N. The supplementary artifact provides both the benchmarks 
and a wrapper around TOBM for reproducible re-generation. We used four 
queries (given in the supplementary artifact) asking for: intersecting paths with 
VRUs (4), passing of parking vehicles on two-lane roads (45), vehicles turning 
right ($3), and vehicles changing lanes without signals ($4), where $4, ®2, and 
$5 have two and 94 has three answer variables. The corresponding FAs have 8 
($4), 4 ($5, 04), and 3 ($3) states. Our tool is executed once per benchmark 
and query combination, as deviations are not be expected due to determinism, 
on an Intel Core i9-13900K with 64 GB RAM and a time limit of ten hours per 
run, using a Windows Subsystem for Linux 1 on a Windows 10 host. The input 
files and tool, with the exact version and configuration used for benchmarking, 
are available online [41]. 
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Fig. 3. Wall clock running times of benchmark queries 9;, i € (1,...,4) and the T- 
(t) resp. X-crossing (x) of size N. 


For the first question, we show wall clock running times of our improved 
algorithm in Figure 3. We exclude parsing and loading of queries and KBs as 
we aim to only evaluate our algorithm. Running times indicate an exponential 
dependency on the data size. There are also dependencies on the benchmark 
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type, e.g., for 2, where the non-existence of parking vehicles on the X-crossing 
improves performance, and $4, where more lanes on the X-crossing increases 
running time. This answers the first question positively, as our approach termi- 
nates in minutes to hours, with the lowest being 25.54 seconds for ®; on the 
20 second T-crossing scenario. However, the timeout was reached for 94 on the 
X-crossing and N > 2 for reasons to be discussed later. 
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Fig. 4. Log-scaled running times with and without the CQ answering optimization 
enabled for the TOBM T-crossing S = 0, N = 3. Running times without the opti- 
mization are extrapolated after one hour. 


The second question is addressed by comparing the running time of the im- 
proved algorithm to the basic algorithm from Algorithm 1. The results in Figure 4 
show that the naïve approach fails for real-world data, even for two answer vari- 
ables. Moreover, most of the time is still spent using the expensive, full semantics 
check despite iterating only through a fraction of all candidates (cf. Table 1). 
Hence, leveraging the CQ engine makes MTCQ answering practically feasible. 
However, some queries may trigger special cases in the optimizations of the CQ 
engine, leading to higher running times, e.g., role inclusion axioms for $3. 

The strong effect of leveraging CQ answering motivates deeper examination. 
For this final question, we show wall clock times of both the CQ answering run tı 
(‘first run’) and the full-semantics run £9 (’second run’) in Figure 4. The effect 
of CQ answering can be twofold: Firstly, a set of candidates can be excluded 
globally. Secondly, even if a candidate was not globally excluded, it generates 
‘local’ (non-)answers that can be cached for subsequent checks of Point 1 of 
Lemma 1. We thus report both exclusions, averaged over all time points and 
checked edges at each time point, in Table 1. Moreover, one can ask whether the 
second run is actually worthwhile. Table 1 reports how many certain answers 
(certc) were already found in the first run (cert;.). 

Our results show CQ answering to aid mainly by excluding candidates glob- 
ally in a highly-optimized fashion, as it can resort to techniques like binary 
instance retrieval, and often avoids consistency checks [36]. Local exclusion has 
minor but non-negligible effects, e.g., avoiding on average 42 additional candi- 
dates for 3. Moreover, all certain answers were already found in the first run, 
indicating suitability of using only the incomplete first run. 
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Table 1. Effects of CQ answering on MTCQ answering for the TOBM T-crossing 


S=0,N =3. 
Query $1 2 z 4, 
Globally excluded candidates (96) 97.88 99.29 97.88 99.71 
Globally and locally excluded candidates (96) 98.73 99.55 99.54 99.80 
certi: | / |cert«c | 1 1 1 1 


However, leveraging CQ answering has its limitations. For $4 on the X- 
crossing and N = 2, the first run excluded 99.83% of all candidates after 2.38 
minutes, leaving 960 candidates for the second run. However, this is no small 
task: for 200 time points in the data this leaves 180 seconds per time point to 
finish within 10 hours. Hence, each candidate must not take up more than 0.1875 
seconds per time point on average, which entails checking multiple edges in mul- 
tiple states. Experiments indicate each edge check to take a two-digit millisecond 
duration. Thus, to efficiently handle large candidate sets in the second run, we 
require further optimizations. 


7 Conclusion 


In this work, we introduced MTCQs as a suitable tool for situation recognition 
when testing requirements in complex operational domains, as illustrated by ur- 
ban automated driving. Our tool, based on OPENLLET, brings MTCQ answering 
into practice by leveraging efficient CQ answering algorithms. Our custom bench- 
marks on safety-critical traffic situations show feasibility of our implementation 
for test evaluation settings and a potential to use our tool in other domains. 
These include risk assessments of other automated transportation systems, e.g., 
trams, maritime vessels, or delivery robots, and big-data analyses, e.g., process 
mining in business applications over intricate real-world structures. 

As future work, we plan to investigate both practical optimizations and the- 
oretical adaptations for increasing performance. For the former, it is interesting 
to (i) study how one can reuse query answers in consecutive time points given 
that potentially only small portions of the data change, (ii) identify fragments 
of MTCQs that can be answered more efficiently in practice (e.g., for runtime 
verification), and (i) treat the spatial information more efficiently. On the the- 
oretical side, it is interesting to study rewriting approaches, where the idea is to 
reduce the computation of certain answers to query evaluation in a target logic 
such as first-order logic (possibly with +,<) or DatalogMTL [39]. The bene- 
fit of such rewriting approaches is that one can leverage existing systems for 
evaluation in the target language. First-order rewritings have been studied in 
the context of more lightweight ontology and query languages [4]. While query 
rewritings need not exist in general (for complexity reasons), they might be very 
fruitful for practically occurring queries and ontologies. 
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Abstract. We present a novel decision procedure for a fragment of separation 
logic (SL) with arbitrary nesting of separating conjunctions with boolean con- 
junctions, disjunctions, and guarded negations together with a support for the 
most common variants of linked lists. Our method is based on a model-based 
translation to SMT for which we introduce several optimisations—the most im- 
portant of them is based on bounding the size of predicate instantiations within 
models of larger formulae, which leads to a much more efficient translation of 
SL formulae to SMT. Through a series of experiments, we show that, on the fre- 
quently used symbolic heap fragment, our decision procedure is competitive with 
other existing approaches, and it can outperform them outside the symbolic heap 
fragment. Moreover, our decision procedure can also handle some formulae for 
which no decision procedure has been implemented so far. 


1 Introduction 


In the last decade, separation logic (SL) [[[5,[30] has become one of the most popular 
formalisms for reasoning about programs working with dynamically-allocated memory, 
including approaches based on deductive verification [B2], abstract interpretation [B4], 
symbolic execution [BJ], or bi-abductive analysis [6]|[2]|18]. The key ingredients of SL 
used in these approaches include the separating conjunction +, which allows modular 
reasoning by stating that the program heap can be decomposed into disjoint parts satis- 
fying operands of the separating conjunction, along with inductive predicates describing 
shapes of data structures, such as lists, trees, or their various combinations. 

The high expressive power of SL comes with the price of high complexity and even 
undecidability when several of its features are combined together. The existing decision 
procedures are usually limited to the so-called symbolic heap fragment that disallows 
any boolean structure of spatial assertions. 

In this paper, we present a novel decision procedure for a fragment of SL that we 
call boolean separation logic (BSL). The fragment allows arbitrary nesting of sepa- 
rating conjunctions and boolean connectives of conjunction, disjunction, and a limited 
form of negation of the form y A ~y called guarded negation. To the best of our knowl- 
edge, no existing, practically applicable decision procedure supports a fragment with 
such a rich boolean structure and at least basic inductive predicates. The decision pro- 
cedure for SL in CVC5 supports arbitrary nesting of boolean connectives (including 
even unguarded negation, which is considered very expensive in the context of SL) but 
no inductive predicates. A support for conjunctions and disjunctions under separating 
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conjunctions is available in the backend solver of the GRASSHOPPER verifier 
though not described in the papers. In our experimental evaluation, we outperform both 
of these approaches on some benchmarks (and can decide some formulae beyond the 
capabilities of both of them). We further show that adding guarded negations to BSL 
makes its satisfiability problem PSPACE-hard. 


To motivate the usefulness of the fragment we consider, we now give several ex- 
amples when SL formulae with a rich boolean structure are useful. First, in symbolic 
execution of heap manipulating programs, one usually needs to consider functions that 
involve some non-determinism—typically, at least the malloc statement has the non- 
deterministic contract {emp} x = malloc() (x — f V (x = nil A emp)} (where f 
is a fresh variable) stating that when the statement is started in the empty heap, once it 
finishes, x is either allocated, or the allocation had failed and the heap is empty. Such 
contracts typically need a dedicated (and usually incomplete) treatment when no sup- 
port of disjunctions is availableP|Further, the guarded negation semantically represents 
the set of counterexamples of the entailment p = 7, and hence allows one to reduce 
entailment queries to UNSAT checking. Guarded negation can also be used when one 
needs to obtain several models of a formula y by joining formulae representing the 
already obtained models to y using guarded negations. One can also use the guarded 
negation to express interesting properties such as the fact that given a list sls(z, y) and 
a pointer y ++ z, the pointer does not point back somewhere into the list closing a lasso. 
This can be expressed through the formula (sls(z, y) ^ ^(sls(z, z) *sls(z, y))) «y > z. 
Finally, boolean connectives can be introduced by translating quantitative separation 
logic into the classical SL [2]. 

In this work, we consider BSL with three fixed, built-in inductive predicates repre- 
senting the most-common variants of lists: singly-linked (SLL), doubly-linked (DLL), 
and nested singly-linked (NLL). Our results can be easily extended for their variations 
such as nested doubly-linked lists of singly-linked lists and the like, but for the price of 
manually defining their semantics in the SMT encoding. We do, however, believe that 
our approach of bounding the sizes of models and instantiations of the individual pred- 
icates can be lifted to more complex inductive definitions and can serve as a starting 
point for allowing integration of SL with inductive definitions into SMT. 


Contributions. Our approach to deciding BSL formulae is inspired by previous works 
on translation of SL to SMT. The early works [27] and translate SL to intermediate 
theories first. Our approach is closer to the more recent approach of 16]. which builds 
on small-model properties and axiomatizes reachability through pointer links directly. 
We extend the SL fragment considered in by going beyond the so-called unique 
footprint property (under which it is much easier to obtain an efficient translation). Fur- 
ther, we define a more precise way to obtain global bounds on models of entire formu- 
lae, and, most importantly, we modify the translation of inductive predicates in a way 
that allows us to encode them succinctly by computing local bounds on their instanti- 
ations. According to our experiments, this makes the decision procedure efficient and 
competitive with the state-of-the-art approaches on the symbolic heap fragment (despite 
the increased decisive power). The claims we make in this paper are proven in [9]. 


? Note that, while the post-condition with a single disjunction might seem simple, the formulae 
typically start growing in the further symbolic execution. 
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Related work. In [3]. a proof system for deciding entailments of symbolic heaps with 
lists was proposed. This problem was later shown to be solvable in polynomial time 
in [8] via graph homomorphism checking. A superposition-based calculus for the frag- 
ment was presented in [23], and a model-based approach enhancing SMT solvers was 
proposed in [24]. In Dif aroma of SL with SMT theories is considered but still 
limited to the symbolic heap fragment. A more expressive boolean structure and inte- 
gration with SMT theories was developed in for lists and extended for trees in 
but still without a support for guarded negations. 

Other decision procedures are focusing on more general, user-defined inductive 
predicates (usually of some restricted form). They are based, e.g., on cyclic proof sys- 
tems (CYCLIST [5], S2S (19]20]); lemma synthesis (SONGBIRD [33]); or automata— 
tree automata are used in the tools SLIDE and SPEN [11], and a specialised type of 
automata, called heap automata, is used in HARRSH Thon procedures do, how- 
ever, not support nested use of boolean connectives and separating conjunctions. 

There also exist works on deciding much more expressive fragments of SL such 
as [10]14|21]26] but they do not lead to practically implementable decision procedures. 


2 Preliminaries 


Partial functions. We write f : X — Y to denote a partial function from X to Y. Fora 
partial function f, dom( f) and img( f) denote its domain and image, respectively; | f| = 
|dom(f)| denotes its size, and f(a) = L denotes that f is undefined for x. A restriction 
fla of f to A C X is defined as f(a) for x € A and undefined otherwise. To represent 
a finite partial function f, we often use the set notation f = {£1  yi,..., En © Yn} 
meaning that f maps each x; to y;, and is undefined for other values. We call partial 
functions f, and f» disjoint if dom( f1) N dom( f2) = Ø and define their disjoint union 
fi 8 fo as fı U f2, which is otherwise undefined. 


Graphs and paths. Let G = (V, —1,...,—) be a directed graph with vertices V and 
edges >=>; U---U —m. For 1 € f < m, a sequence c = (v9,tv4,...,v) € VT 
is a path from vo to v,, via ++ in G, denoted as o : vo ~f Un, if all elements of o are 
distinct, and for all 0 <i < n, it holds that v; — vj, 1. By the definition, paths cannot 
be cyclic. The domain of the path c is the set dom(o) = (vo, v1, ..., v5 1]. and the 
length of the path is defined as |c| = |dom(c)| = n. 


Formulae. For a first-order formula p, we denote by v[t/x] the formula obtained by 
simultaneously replacing all free occurrences of the variable x in y with the term t. For 
a first-order model M and a term t, we write t^^ to denote the evaluation of t in M 
defined as usual. 


3 Separation Logic 


Syntax. Let Vars be a countably infinite set of sorted variables. We denote by z? a 
variable x of a sort S € Sort = {S,D,N} representing a location in an SLL, DLL, 
or NLL, respectively. We omit the sorts when they are not relevant or clear from the 
context. We further assume that there exists a distinguished, unsorted variable nil. We 
write vars(y) to denote the set of all variables in y plus nil (even when it does not 
appear in y). Analogically, varss(q) stands for all variables of the sort S plus nil. 
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sh) H= £x y iff s(x) œx s(y) and dom(h) = 0 for > € {=, Æ} 

s, h) E: xz (fi: fijier iffh = {s(x) > (fi : s(fi))ier] 

s, h) Edi X Y2 iff (s, h) H| v1 ba (s, h) H v» ford € {A, An, V) 

s, h) H Vi * v» iff Ihi, h2. h = hi W ha Æ L and (s, hi) E vi for i = 1,2 

s, h) F dx. v iff there exists £ such that (s[r +> 4], h) Ew 

s, h) E- sls(x, y) iff (s, h) = x = y, or s(x) Æ s(y) 
and (s, h) E- dn. z > n x sls(n, y) 

sh) Kdis(x,y,2',y’) iff(sh) H z = ya! = y', or s(a) + sly), sl’) 4 sly’), 
and (s, h) E- 3n. 2 (n: m, p: y) * dls(n, y, 2’, x) 

s, h) E- nls(z, y, z) iff (s, h) = x = y, or s(x) Æ s(y) 


and (s, h) H 3n, t. x > (n: n,t: t) «sls(n, z) x nls(t, y, z) 


Fig. 1: The semantics of the separation logic. The existential quantifier is used for the 
definition of the semantics of inductive predicates and it is not a part of our fragment. 


The syntax of our fragment is given by the following grammar: 


p z— a? e» (n: n) | a? = (nin, p: p) | aN (n: m,t:t)  (points-to predicates) 


N at m) 


sls(z?, y’) | dls(zP, yP, xD. yP) | nis(zN, y, z (inductive predicates) 


QAs—r-—y|vxzyl|p|m (atomic formulae) 
e:—eaílesele^elevele^-e (formulae) 
The points-to predicate x +> (fi : fi,...,f, : fn) denotes that x is a structure 


whose fields f; point to values f;. We often write x +> n instead of x — (n: n) and 
x ++ iftheright-hand side is not relevant. We call x the root of the points-to predicate. 
If x is an inductive predicate sls(z, y), dls(z, y, x’, y’), or nls(z, y, z), we again call x 
the root of 7, y is the sink of 7, and we write m(x, y) to denote the root and the sink. 
We define the sort of the predicate 7, denoted as S+, as the sort of its root. Then, there 
is a one-to-one correspondence of predicates and sorts, which we often implicitly use. 


Memory model. Let Loc be a countably infinite set of memory locations, and let Field — 
{n, p, t} be the set of fields. A stack is a finite partial function s : Vars — Loc. A heap 
is a finite partial function h : Loc — (Field — Loc). For succinctness, we write A(£, f) 
instead of h(£)(f). To represent heap elements in a readable way, we write functions 
Field — Loc as vectors with labels, i.e., h(£) = (f : h(£,f) | f € Field ^ h(£f) A L) 
and we write img(h) for (£ € Loc | dé’, f. h(é’, f) = £}. Moreover, we use h(£) = n 
when h(£) = (n : n). A stack-heap model is a pair (s, h) where s is stack and h is 
a heap such that s(nil) Z L and h(s(nil)) = L. We define the set of locations of the 
model (s, h) as locs(s, h) = img(s) U dom(h) U img(h). 


Semantics. The semantics of our SL over stack-heap models is given in Fig.[1] For pure 
formulae, we use the so-called precise semantics, which additionally requires that the 
heap must be empty] The semantics of pointer assertions, boolean connectives, and 


^ This is a common approach to avoid the atom true to be expressed as nil = nil. In our fragment, 
we forbid true in order not to introduce “unbounded” negations as ^ £ true ^ ^q. Due to 
this change, symbolic heaps are formulae of form * Y; where each Y; is an atom. 
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separating conjunctions is as usual. The intuition behind the semantics of the inductive 
predicates is as follows. An SLL segment sls(z, y) is either empty or represents an 
acyclic sequence of allocated locations starting from x and leading via the n field to y, 
which is not allocated. A DLL segment dls(z, y, x’, y’) is either empty with z = y and 
x’ = y', or it represents an acyclic sequence that is doubly-linked via the n and p fields 
and leads from the first allocated location x of the segment to its last allocated location 
x’ (x and x’ may coincide) with y/y' being the n/p-successors of z'/z, respectively. Both 
y and y' are not allocated. An NLL segment nls(z, y, z) is a (possibly empty) acyclic 
sequence of locations starting from x and leading to y via the t (top) field in which 
successor of each locations starts a disjoint inner SLL to z via n. 


Stack-heap graphs. We frequently identify stack-heap models with their graph repre- 
sentation. A stack-heap model (s, h) defines a graph G[(s, h)] = (V, (—t)terieia) where 
V = locs(s, h) and u —« v iff h(u, f) = v. We frequently use the fact that if there exists 
a path o : x ^ y ina stack-heap graph, then it is uniquely determined because f-edges 
are given by a partial function. 


4 Small-Model Property 


Small-model properties, which state that each satisfiable formula has a model of bound- 
ed size, are frequently used for various fragments of SL to prove their decidability [7] or 
to design decision procedures [16]26]29]. The latter is also the case of our translation- 
based decision procedure which will heavily rely on enumeration over all locations, 
and, for its efficiency, it is therefore necessary to obtain location bounds that are as 
small as possible. 

The way we obtain our small-model property is inspired by the approach of 
and by insights from the so-called strong-separation logic [26]. The main idea is to 
define a satisfiability-preserving reduction |? h which takes a heap h (referenced from a 
stack s), decomposes it into basic sub-heaps (which we call chunks), and reduces it per 
the sub-heaps in such a way that its size can be easily bounded by a linear expression. 
To define the reduction, we first need to introduce some auxiliary notions related to 
stack-heap models. 

We say that a model (s, h) is positive if there exists y with (s, h) = ọ. A positive 
model (s, h) is atomic if it is non-empty, and for all positive models (s, ^1) and (s, h2), 
h = hı & hg implies that hy = Ø or ho = (). In other words, atomic models cannot be 
decomposed into two non-empty positive models. Several examples of atomic models 
are shown in Fig. 2] Observe that the models of dls (Figure Db) and nls (Figure [2c] are 
indeed atomic as any of their decomposition, in particular the split at the location u, 
does not give two positive models. 

A sub-heap c C h is a chunk of a model (s, h) if c is a maximal sub-heap of h such 
that (s, c) is an atomic positive model. Notice that the way the definition of chunks is 
constructed excludes the possibility of using as a chunk a sub-heap of a heap that itself 
forms an atomic model. The reason is that otherwise the remaining part of the larger 
atomic model could not described by the available predicates. For example, in nested 
lists as shown in Fig. [c] one cannot take as a chunk a part of some inner list (e.g., the 
pointer u ++ z) as the heap shown in the figure itself forms an atomic model. Indeed, if 
u +> z was removed, one would need a more general version of the NLL predicate to 
cover the remaining heap by atomic models. 
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6-OO-G9 0-00 


(a) A singly-linked list sls(z, y). 


S: Ò- Me 


(b) A doubly-linked list dls(z, y, 2’, y). (c) A nested singly-linked list nls(z, y, z). 


Fig. 2: An illustration of reductions of atomic models of inductive predicates. Removed 
heap locations are red, removed edges are dotted, and added edges are highlighted. 


Lemma 1 (Chunk decomposition). A positive model (s, h) can be uniquely decom- 
posed into the set of its chunks, denoted chunks(s, h), i.e., h = (t) chunks(s, h). 


Minimal atomic models of inductive predicates. The key reason why the small-model 
property that we are going to state holds is that our fragment of SL cannot distinguish 
atomic models of the considered predicates beyond certain small sizes—namely, two 
for sls and nls, and three for dls. For further use, we will now state predicates describing 
exactly the sets of the indistinguishable lists of the different kinds. 

We start with SLLs and use a disequality to exclude empty lists: sls>1 (x, y) 7 
sls(z, y) * x Æ y, and a guarded negation to exclude lists of length one consisting of a 
single pointer only: sls>2(x, y) = sls>1 (x,y) ^ ^(x — y). A similar predicate can be 
defined for NLLs too: nls>o(x, y, z) = (nls(z,y, z) «x Z y) ^^(x > (n: zt: y)). 

For DLLs, we define dls>2(x, y, 2’, y") = dls(a, y, 2’, y')*x A ys z wx’ toexclude 
models that are either empty or consist of a single pointer; and dls-3(z, y, ^, y) = 
dlss»(z, y, z', y") ^ n(x > (n: z', p: y) *«z' — (n: y, p: x)) to also exclude models 
consisting of exactly two pointers. 

It holds that atomic models, and consequently also chunks, are precisely either mod- 
els of single pointers or of the above predicates. 


Lemma 2. For atomic model (s, h), exactly one of the following conditions holds. 


l. (s,h) Ea for some z. (pointer-atom) 
2. (s, h) = sls>2(x, y) for some x and y. (sls-atom) 
3. (s,h) E dlssa(z, y, x’, y") for some a, y, x’, and y’. (dls-atom) 
4. (s, bh) E nlsso(z, y, z) for some x, y, and z. (nls-atom) 


We can now define the reduction in the way we have already sketched. 


Definition 1. The heap of a positive model (s, h) reduces to |? h = A chases) Le 
where the reduction of a chunk c with a root x as follows: 
- J5 c=cif (sc) Hre. 
- 45c = {s(x) o £, £o s(y)} where £ = c(s(x), n) if (s, c) H|sls>2 (x, y) for some y. 
- [5e = {s(e) > (n: 6 pis(y) L> (n: s(x"), p:s(z)), (a^) > (n:5(y), p:4)} 
where £ = c(s(x),n) if (s, c) E- dls>3 (x,y, 2’, y’) for some x’, y' and y. 
- 45c = (s(x) 5 (t: &Gn:5s(z)), £e (t:s(y),n:5s(z))) where € = c(s(x),t) if 


(s,c) HE nls>o(a, y, z) for some y and z. 
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We lift the reduction to stack-heap models as |* (s, h) = (s^, |* h) where s' = s| x 
for some set of variables X and show that it preserves satisfiability when X = vars(q). 


Theorem 1. Fora positive model (s, h), it holds that (s, h) = «iff |"? ? (s, h) H v. 


The final step to show our small-model property is to find an upper bound on the 
size of the reduced models. We define the size of a variable x?, ||z? ||, which represents 
its contribution to the location bound, and is defined as 2 if S € (S, N) and 1.5 if 
S = D (this corresponds to the size of a reduced chunk of sort S divided by the number 
of variables which are allocated in it). We further define ||nil|| = 0. The location bound 
of ọ is then given as bound(p) = 1+ |? 0, cvars(s) [x] | (the additional location is for 
nil). Analogically, the location bound for a sort S is bounds(q) = [>> 


c€varss (~) | |x| |] s 
Theorem 2 (Small-model property). If a formula « is satisfiable, then there exists a 
model (s, h) = v such that |locs(s, h)| € bound(y). 


We conjecture that the bound can be further improved, e.g., by showing that each 
model can be transformed to an equivalent one (indistinguishable by BSL formulae) 
such that the number of its chunks is bounded by the number of roots of spatial predi- 
cates in y. We demonstrate this on the formula sls(z, y)*y + z and its model in which y 
points back into the middle of the list segment (thus splitting it into two chunks). 
Clearly, this model can be transformed by redirecting z outside of the list domain. 


5 Translation-Based Decision Procedure 


In this section, we present our translation of SL to SMT. We first present an SMT 
encoding of our memory model and a translation of basic predicates and boolean con- 
nectives. Then we discuss methods for efficient translation of separating conjunctions 
and inductive predicates with the focus on avoiding quantifiers by replacing them by 
small enumerations of their instantiations. 

We fix an input formula ọ and let ng = bounds(«) for each sort S € Sort. 


5.1 Encoding the Memory Model in SMT 
To encode the heap, we use a classical approach which encodes its mapping and domain 
separately (16]27]29]. Namely, we use arrays to encode mappings and sets to encode 
domains. We also use the theory of datatypes to represent a finite sort of locations by a 
datatype L 4 loc™ | loc] | ... | loc. | loc? |... | loch, | loc! | ... | loch... 

Now, we define the signature of the translation’s language over the sort L. For each 
x € vars(y), we introduce a constant x of the same name—its interpretation represents 
the stack image s(x). To represent the heap, we introduce a set symbol D representing 
the domain and an array symbol A; for each field f € Field which represents the map- 
ping of the partial function AZ. h(é,f). To distinguish sorts of locations, we further 
introduce a set symbol Dg for each sort S € Sort. We define meaning of these symbols 
by showing how a stack-heap model can be reconstructed from a first-order model. 


Definition 2 (Inverse translation). Let M be a first-order model. We define its inverse 
translation T; (M) = (s, h) where s(x) = a™ if x € vars(y) and 
(n: halg") ifl € (Dn Dg)^ 
h(£) = ¢ (n: hM p: h^ ifle (Dn Dp)“ 
(n: h[ t: UJ) — if£e (Dn Dy)™. 
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To ensure consistency of the translation with the memory model used, we define the 
following axioms that a result of translation needs to satisfy: 


Ag = nil = loc" A nil Z D A VAN (Ds = {loc"",locf,... loc? P^ A z € Dg). 


ng 
S€Sort c €varss(q) 

The axioms ensure that nil is never allocated, that each variable is interpreted as a lo- 

cation of the corresponding sort and they fix the interpretation of the sets Ds, Dp, Dn, 

which we will later use in the translation to assign sorts to locations. 


5.2 Translation of SL to SMT 


We define the translation as a function T(y) = A, ^ T(y, D) where Ay are the above 
defined axioms and T(y, D) is a recursive translation function of the formula q with the 
domain symbol D. The translation T(-) together with the inverse translation of models 
TZ! () are linked by the following correctness theorem. 


Theorem 3 (Translation correctness). An SL formula q is satisfiable iff its translation 
T(q) is satisfiable. Moreover, if M |= T (p), then TZ (M) E. 


The translation of non-inductive predicates and boolean connectives is defined as: 


Tiamy,F) Samy ^ F=0 for x € (2, Z) 
T(V px V», F) 4 T(vi, F) bd T(v», F) for ™ c (A, V, An} 
T(z (fi: fier, F) $ P= (x) ^ N hale] = fi 
icr 


The translation of boolean connectives follows the boolean structure and propagates 
the domain symbol F to the operands. The translation of pointer assertions postulates 
content of memory cells represented by arrays and also requires the domain F to be {x}. 


Translation of separating conjunctions. The semantics of separating conjunctions in- 
volves a quantification over sets (heap domains). The most direct way of translation is 
to use quantifiers over sets leading to decidable formulae due to the bounded location 
domain. This approach combined with a counterexample-guided quantifier instantiation 
is used in the decision procedure for a fragment of SL supported in CVC5 (29]. In some 
fragments, however, separating conjunctions can be translated in a way that completely 
avoids quantifiers. An example is the fragment of boolean combinations of symbolic 
heaps which has the so-called unique footprint property (UFP) [16127]—Aa formula « 
has a (unique) footprint in a model (s, h) with (s, A) H wv * tru@?| if there exists a 
(unique) set F such that (s, h|p) = v». The UFP-based approaches of axioma- 
tize the footprints during translation and check operands of separating conjunctions just 
on the sub-heaps induced by their footprints. 

However, UFP does not hold for BSL because of disjunctions. As an example, take 
the formula = x + y V emp and the heap h = {x +> y}. Both (s, Alpsay}) H v 
and (s, h|g) H w hold. The sets {s(a)} and are, however, the only footprints of Y in 
(s, h), and this observation can be used to generalise the idea of footprints beyond the 
fragment in which they are unique. 


> Assuming the standard semantics of true which is not part of our logic. 
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Instead of axiomatizing the footprints, our translation builds a set of footprint terms 
for operands of separating conjunctions. This change can be also seen as a simplifi- 
cation of the former translations as it eliminates the need to deal with two kinds of 
formulae (the actual translation and footprint axioms), which must be treated differ- 
ently during the translation. However, the precise computation of the set of all foot- 
prints of 4 in (s, h), denoted as FP(, (v), is as hard as satisfiability—when the set 
of footprints is non-empty, the formula ~ is satisfiable. Therefore, we compute just an 
over-approximation denoted as FP (Y). This is justified by the following lemma which 
gives an equivalent semantics of the separating conjunction in terms of footprints. 


Lemma 3. Let y 5 i) *Y and let (s, h) be a model. Let .F and F be sets of locations 
such that FP (sy (Wi) € Fi. Then (s, h) E- V1 * v iff 


V V Ah 


Fi€F, Fo€F2 i=1,2 


FEW A FLO Fe =9 A FLU Fy = dom(h). 


Intuitively, to check whether a separating conjunction holds in a model, it is not nec- 
essary to check all possible splits of the heap, but only the splits induced by (possibly 
over-approximated) footprints of its operands. The lemma is therefore a generalisation 
of UFP and leads to the following definition of the translation T(w 1 * We, F): 


EI € JA. EI, € Fa. T(w1, Fi) ^ T(v», F2) ^F N Fo = OAF = Fi U F5. 


Here, we use a quantifier expression of the form Jx € X. w as a placeholder that helps 
us to define two methods which the translation can use for separating conjunctions: 


— The method SatEnum computes sets of footprints F; as FP? (u;) (the computation 
is described below) and replaces expressions 3x € X. v with V „rex v[z'/z] as 
in Lemma [B] This strategy is quite efficient in many practical cases when we can 
compute small sets of footprints Fı and 5. 

— The method SatQuantif does not compute sets F; at all and replaces dx € X. v 
simply with 3x. wv. This strategy is better when the existential quantifier can be 
later eliminated by Skolemization or when the set of footprints would be too large. 


We now show how to compute the set of footprint terms FP* (v). We again post- 
pone inductive predicates to Section [5.3] We just note that their footprints are unique. 
The cases of pure formulae and pointer assertions follow directly from the definition of 
their semantics, which requires the heap to be empty and a single pointer, respectively. 


FP* (x pa y) = (0) for pac (2, 4} FP# (x+y _) = {{a}} 


For the boolean conjunction, we can select from footprints of its operand the one with 
the lesser cardinality. Since negations have many footprints (consider, e.g., emp), we 
define the case of the guarded negation by taking footprints of its guard. The disjunction 
is the only case which brings non-uniqueness as we need to consider footprints of both 
of its operands. 


FP (i4 A Aa) = FP? (yi) FP? (y V p2) = FP*(y1) U FP? (ya) 
FP (i, A p2) = if |FP*(v)| € |FP* (v)| then FP* (%1) else FP* (%2) 
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Finally, we define footprints of the separating conjunction by taking the union Ff, U F> 
for each pair (F1, F5) of footprints of its operands. Notice that here F; U F represents 
an SMT term, therefore we cannot replace it with a disjoint union which is not available 
in the classical set theories in SMT. We can, however, use heuristics and filter out terms 
for which we can statically determine that interpretations of F and F are not disjoint. 


FP (y * %2) = (Fi U Fs | Fy € FP” (y) and Fy € FP* (%2)} 
We state the correctness of the footprint computation in the following lemma. 


Lemma 4. Let M be a first-order model with M = T(p) and let (s, h) = T5! (M). 
Then we have FP,..,)(y) c (F^ | F € FP? (i). 


5.3 Translation of Inductive Predicates 


To translate inductive predicates, we express them in terms of reachability and paths 
in the heaps. While unbounded reachability cannot be expressed in first-order logic, we 
can efficiently express bounded /inear reachability in our encoding. The linearity means 
that each path uses only a single field (which is not the case, e.g., for paths in trees). 
All predicates in this section are parametrised with an interval [m,n] which bounds the 
length of the considered paths. When we do not state the bounds explicitly, we assume 
conservative bounds [0, bounds(y)] for a path starting from a root of a sort S. We 
show how to compute more precise bounds in Section [6] We start with the translation 
of reachability: 


reach" (h, z,y) h”[2] 2 y | reach" (5, y, y) & V misnreach""(h, L,Y) 


Here, the predicate reach" (h, x, y) expresses that x can reach y via a field represented 
by the array h in exactly n steps. Similarly, reach": expresses reachability in m to n 
steps. Besides reachability, we will need a macro pathç (h, x, y) expressing the domain 
of a path from z to y, or the empty set if such a path does not exists: 


patho” (h, x, y) 28 o«i«n C(h'[z]) 
path" (h, a, y) £ = if e then (pathg™ (h, x, y)) 
- else if (reach™” (h, x, y)) then (pathg” (h, x, y)) else (0) 


The additional parameter C is a function applied to each element of the path that 
can be used to define nested paths. We define a simple path path!" (h, x,y) 
path!” "ln, x,y) with C = M. (£) and a nested path as path” "l(hy, ho, £, y, z) 
path” lih, ay) with C = Al. pathg(h2, £, z). In the case of the nested path, the 
array hı represents the top-level path from x to y, and ha represents nested paths termi- 
nating in the common location z. Now we can define footprints of inductive predicates 
using path terms as follows: 


l> Ip 


FP* (n(x, y)) = {paths (hn, c, y)} for m € (sls, dls} 
FP (nls(z, y, z)) = {pathy (he, hn, zT, Y, z)} 
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The common part of the translation T(r (x, y), F) postulates the existence of a top- 
level path from x to y and a domain ^ based on this path (formalised in the formula 
main. path below); and ensures that all locations have the correct sort (through the for- 
mula typing). For DLLs, we add an invariant which ensures that its locations are cor- 
rectly doubly-linked (the back links formula), and we further need a special treatment 
of the cases when the list is empty as well as a special treatment for its roots and sinks 
(cf. the formula boundaries). For NLLs, we add an invariant stating that an inner list 
starts from each location in its top-level path (the inner lists formula) and that those 
inner paths are disjoint (the disjoint formula 


- T(sls(z,y), F) & main.path ^ typing where 
main. path £ reach(hn, x,y) ^ F = pathg(hn, x, y) and typing = F C Ds. 
- T(dls(z, y, z^, y), F) & empty V nonempty where 
empty =a2=yAa' =y ^F —(J, 
nonempty = z Æ y ^ z' Æ y' ^ main. path ^ boundaries ^ typing ^ back links, 
main.path = reach(h,, x,y) ^ F = pathg(hn, £, y), 
boundaries = hpx] = y' ^ hlr | 2 y ^a! €F Ay gF, 
typing = F C Dp, 
back links = VZ. (L € F ^£ z z^) > hp[hs[£]] = 4. 
- T(nls(z, y, z),F) 9 main_path ^ typing ^ inner lists ^ disjoint where 
main. path = reach(hi, x,y) A F = path y (hi hn, 2, y, z), 
typing = pathg(h;, x, y) C Dy A F \ paths (hr, 2, y) C Ds, 
inner lists € V£. £ € F N Dy > reach(hy, [E], z), 
disjoint £ Ver, £5. (£1, 03) CFAAU-ZzÉ,^hgj£] = hs [£2]) > hal€i] Z F. 
Path quantifiers. Invariants of paths are naturally expressed using universal quantifiers. 
For quantifiers, however, we cannot directly take advantage of bounds on path lengths. 
Therefore, similarly as for separating conjunctions, we use the idea of replacing quanti- 
fiers by small enumerations of their instances, which is efficient when we can compute 
small enough bounds on the paths. For example, if we know that the length of an f-path 


with a root x is at most two, it is enough to instantiate its invariant for x, h¢[a], and 


h? [x]. This idea is formalised using expressions m £. w, which we call path quanti- 
fiers and which state that w holds for all locations of the path with the length n starting 


from z via the array A: 

Por 6v = Aosicn phie 
If we need to quantify over nested paths, we need to use two path quantifiers (one for 
the top-level path and one for the nested paths). The quantifiers in the last conjunct of 
the NLL translation can be rewritten as Pn, 2) £4. Pis, z) £5. Pes, pr) £3. Penney) o. 
In this expression, /^ and /^ range over locations in the top-level list, and £4 and £2 
range over locations in the nested paths starting from ¢/, and £^, respectively. 


6 In the consequent of the disjoint formula, we could also write hn [£1] = z instead of hn [l1] Z 
F, but the latter leads to better performance of SMT solvers. 


Deciding Boolean Separation Logic via Small Models 199 
5.4 Complexity 


This section briefly discusses the complexity of the proposed decision procedure as well 
as the complexity lower bound for the satisfiability problem in the considered fragment 
of SL. We will use SAT(w 1,...,W,,) to denote the satisfiability problem for a sub- 
fragment constructed of atomic formulae and the connectives w; and SAT (01, --, Wn) 
to denote the fragment where none of the connectives w; appear. 


Theorem 4. The procedure SatQuantif produces formula of polynomial size, and, for 
SAT (A-,, it runs in NP. The procedure SatEnum runs in NP for SAT(V). 


Proof (sketch). When not considering the instantiation of quantifiers over footprints, 
both SatQuantif and SatEnum produce a formula T(y) of a polynomial size dom- 
inated by the translation of inductive predicates. For the variant of the translation of 
inductive predicates using universal quantifiers over locations, the size is O(n?) for 
SLLs and DLLs (dominated by the O (n?) size of the path term), and O(n?) for NLLs 
(dominated by path y). If the input formula does not contain guarded negations, then 
all quantifiers can be eliminated using Skolemization. The translated formulae are then 
in a theory decidable in NP (e.g., when sets are encoded as extended arrays (22). 

The procedure SatEnum can produce exponentially large formulae because of the 
footprint enumeration. This can be prevented if the input formula does not contain dis- 
junctions, in which case the footprints of all sub-formulae are unique, i.e., singleton 
sets. The translated formulae are then again in a theory decidable in NP. 


Theorem 5. SAT (>, A^, ^, V, *) is PSPACE-complete. 


Proof (sketch). Membership in PSPACE was proved in for a more expressive frag- 
ment. For the hardness part, we build on the reduction from QBF used in [7 . In this 
reduction, the boolean value of a variable is represented by the corresponding SL vari- 
able being allocated (always pointing to nil for simplicity). The fact that zx is false is 
expressed using a negative points-to predicate stating that x is not allocated. The exis- 
tential quantifier is expressed using the separating conjunction, and the universal quan- 
tifier is obtained using the (unguarded) negation. (For details, see (71) 

We show that this reduction can be done without the unguarded negation and the 
negative points-to assertion, using the guarded negation instead. The key observation is 
that, for a QBF formula with variables X, we can express that all variables in X can 
have arbitrary boolean values as arbitrary X] = »k,e x (x + nil V emp). In the context 
of variables X, we can then express negation as ^F £ arbitrary[X] ^ —F and the truth 
values of a variable x as ^x = arbitrary[X \ {x}] and x = arbitrary[ X] * x — nil. The 
rest of the reduction then easily follows (7). 


6 Optimised Bound Computation 


In many practical cases, the main source of complexity is the translation of induc- 
tive predicates, which heavily depends on the possible lengths of paths between lo- 
cations. We now propose how to bound the length of these paths based on the so-called 
SL-graphs which are graph representations of constraints imposed by SL formulae. 
SL-graphs were originally used for representation and deciding of symbolic heaps with 
lists in (8]. Here, we use their generalised form which captures must-relations holding 
in all models of a given formula. Note that the nodes of the graphs are implicitly given 
by the domains of the involved relations, which themselves can be viewed as edges. 
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Definition 3. An SL-graph of q is a tuple Glo] = (9,8, (Dee, ®¢) tricia) where: 


- QC vars(q) x vars(y) is an equivalence relation called must-equality, 

—- QC vars(o) x vars(q) is a symmetric relation called must-disequality, 

- ©; € vars(q) x vars(q) is a must-f-pointer relation, 

- ©; € vars(q) x vars(y) is an irreflexive must-f-path relation, 

— (84 C vars(q)? x vars(q)? is a symmetric relation called must-f-path-disjointness. 


Except (9, the components of Gy] represent atomic formulae—equalities, disequali- 
ties, pointers, and paths (i.e., list segments)—holding within all models of y. The fact 
that (x1, y1) © (T2, ya) states that, in all models of y, the domains of f-paths from zı 
to y1 and from z» to y» are disjoint. 

To compute the SL-graph G[v], we define some auxiliary notation. We define Gg 
to be an SL-graph where all the relations are empty. We write G < {x; Pa; yj}ier to 
denote the SL-graph G” which is the same as G with the elements x; >; y; fori € I 
added to the corresponding relations. We use U and M as a component-wise union and 
intersection of SL-graphs, respectively. We define the disjoint union of SL-graphs as: 


Gy B G2 = (G1 U G2) 
3 (zQy|zx € alloc(Gi), y € alloc(G2), and (x is not nil or y is not nil) } 
< {e1 Œ ea | f € Field, e1 € paths,(G1), and e» € paths,(G2)}. 


Here, paths,;(G) is defined as Gt U Cy, and the set of must-allocated variables is 
alloc(G) = (x | Jy, f. x Gy y or (£ © y and x  y)]U (nil) (nil is added for technical 
reasons). We further assume that all operations on SL-graphs (<, LI, M, and H) preserve 
relational properties (symmetry, transitivity, etc.) of the components of SL-graphs by 
computing the corresponding closures after the operation is performed. We compute 
the SL-graph G [v] as follows. 


Gje = y] = Go a {19y} Gir (fi: fijiar] = Go aix Or, fihier 
Gle # y] = Go a {1y} Glsls(x,y)] = Go < {£ On v) 
G[yi ^ mye] = Giya] G[dls(z, y, z',y)] = Gy a (x Q, Y 2' Op v') 
]- [ 
[ 


Gl ^v»] = Gii] O Gip] G[nls(z, y, z)] = Ga < {£ Q, z; 2 Qv] 
Gli V v] = Giya) N Giy] | Ga * p2] = Givi] B Gv] 


Observe that we only approximate dls and nls. After the construction is finished, we 
apply the following rules for matching of pointers and for detection of inconsistencies. 


X1 yi z2Qg yo 11 O12 zQy xy 


-match : 
y © Yeo Gerda) (p is unsat 


(contradiction) 


Tighter location bounds. Using SL-graphs, we can slightly improve the location bound 
from Section [4]by considering equivalence classes of © instead of individual variables 
(this can be also used to refine the later described path bound computation) and by 
defining ||z|| = 1 if x is a must-pointers, i.e., z Q y for some f and y. 
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2 
5-0-00 
(a) Fragment of SL-graph G[y]. (b) Graph G$. (c) Graph G7. 


Fig. 3: An illustration of the bound computation for the path c from a to c on a fragment 
of SL-graph of p = (sls(a, b) «b — cc œ dsls(d, a)) ^ —(sls(a, c) * sls(c, a)). The 
highlighted edges denote the paths used to determine the bound [1, 3]. 


Path bounds. We now fix an f-path c from z? to y and show how to compute an interval 
|£, u] that gives bounds on its length. The computation of the path bounds runs in two 
steps. In the first step, we compute an initial bound [/9, u9] for each edge e € paths,(G). 
If e is a pointer edge, its bound is given as |1, 1]. For a path edge e = (a,b), we define 
(0 = 1 if a band 0 otherwise; while u? is defined as bounds (v) — Xey ||v|| where 
V = (v € varss(o) | vis not x and Ju. (v, u) (9, (x, y)). This way, we exclude from 
the computation of the initial upper bound the source v of each path disjoint with c and 
all locations possibly allocated in a chunk with the root v. Note that it can be the case 
that the actual size of this chunk has a lesser size than ||v||, but this means that we were 
too conservative when computing the global location bound and can decrease the path 
bound by the same number anyway. 

In the second phase, we compute the bounds of the path c using initial bounds from 
the first step. The computation is based on two weighted directed graphs derived from 
the SL-graph G: G* for the upper bound and G^ for the lower bound (in both cases, 
the vertices are implicitly given as vars(q), and the edge weight of an edge e is given 
by u? and Æ computed in the previous step, respectively): 


Go = (a — b | (a,b) € paths (G)), 
Gt = (a > b | (a © band a y) or 
(a yg b and dw. nonempty(y, w) and (y, w) 9 (a, b)}. 


Here, the condition nonempty(y, w) states that a directed SL-graph edge (y, w) is non- 
empty which holds if either y ©; w, or when y @; w and y B w. 

Intuitively, the upper bound u is computed as the length of the shortest path from x 
to y in G}. Since f-paths are uniquely determined, we know that no path can be longer 
than the shortest one, and thus u is indeed a correct upper bound. The lower bound £ is 
computed as the length of the longest path starting from x (ending anywhere) in G^. By 
construction, G^ contains only those edges for which one can prove that they cannot 
contain y in their domains. A path from x of a length £ therefore implies that x cannot 
reach y in less than £ steps, and thus £ is indeed a correct lower bound. 


Example. We demonstrate the path bound computation in Fig. B] which shows a frag- 
ment of the SL-graph of a formula ¢ (it shows only those (9, edges that are relevant 
in our example) and the graphs G and G for the path c from a to c. We have that 
||b|| = J|c|| = 1 and ||a|| = ||d|| = 2. This gives us the location bound, which is 6. In 
the first phase, we compute the initial bound [0, 2] for paths of the predicates sls(a, b) 
and sls(d, a) because both of them are disjoint with all the other paths in G[y]. In the 
second phase, we get the bound for c equal to [1, 3] instead of the default bound [0, 6]. 
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7 Experimental Evaluation 


We have implemented the proposed decision procedure in a new solver called ASTRAI["] 
ASTRAL is written in OCaml and can use multiple backend SMT solvers. With the en- 
coding presented in Section|5| it can use either CVC5 supporting set theory directly 
or Z3 supporting it by a reduction to the extended theory of arrays [22]. We have also 
developed an alternative encoding in which both locations and location sets are repre- 
sented as bitvectors. The bitvector encoding differs only in expressing set operations 
on the level of bitvectors with additional axioms ensuring that all locations "can fit” 
into sets encoded by the bitvectors (for details, see (9p. With the bitvector encoding, 
a backend solver only needs to support theories of bitvectors and arrays, which are 
both standard and supported by many other SMT solvers. Another advantage is that the 
quantification on bitvectors seems to perform significantly better than on sets. 

In our experiments, if we do not say explicitly which encoding and solver is used, 
we use the bitvector encoding and BITWUZLA [25] as the backend solver, which we 
found to be the best performing combination. We set a limit for the method SatEnum to 
64 footprints. If this limited is exceeded, we dynamically switch to SatQuantif. We 
use path quantifiers when the path bound is at most half of the domain bound. These 
are design choices that can be revisited in the future. 

All experiments were run on a machine with 2.5 GHz Intel Core 15-7300HQ CPU 
and 16 GiB RAM, running Ubuntu 18.04. The timeout was set to 60 s and the memory 
limit to 1 GB. Our experiments were conducted using BENCHEXEC 4]. a framework 
for reliable benchmarking. 


7.1 Entailments of Symbolic Heaps 


In the first part of our evaluation, we focus on formulae from the symbolic heap frag- 
ment which is frequently used by verification tools and for which there exist many 
dedicated solvers. We therefore do not expect to outperform the best existing tools but 
rather to obtain a comparison with other translation-based decision procedures. 

In Table we provide results for the category QF SHLID ENTL (entailments 
with SLLs). We divide the category into two subsets: verification conditions (which are 
simpler) and more complex artificially generated formulae "bolognesa" and “clones” 
from (23]. During the experiments, we found out that several “cloned” entailments con- 
tain root variables on the right-hand side of the entailment that do not appear on the 
left-hand side, making the entailment trivially invalid when its left-hand side is satis- 
fiable. For a few hard clone instances, this makes a problem for ASTRAL as it can- 
not use the path bound computation as such roots do not appear in the SL-graph. We 
have therefore implemented a heuristic that detects entailments y = w that can be 
reduced to satisfiability of y. Since this is a benchmark-specific heuristic, we present 
also the version without this heuristic (ASTRAL *) in Table [Ta] The optimised version 
of ASTRAL is able to solve all the formulae being faster than other translation-based 
solvers GRASSHOPPER!) and SLOTH. For illustration, the table further contains the 
second best solver in the latest edition of SL-COMP, S29) 


7 https://github.com/TDacik/Astral 


8 Since GRASSHOPPER is not an solver but a verification tool, we encode the entailment check- 
ing as a verification of an empty program. 

? We had technical issues running the winner ASTERIX [24]. The difference between those tools 
is, however, negligible. 
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Table 1: Experimental results for formulae from SL-COMP. The columns are: solved in- 
stances (OK), out of time/memory (RO), instances on which ASTRAL wins—ASTRAL 
can solve it and the other solver not or ASTRAL solves it faster (WIN), instances solved 
in the time limits of 0.1 s and 1 s, and the total time for solved instances in seconds. 


(a) Results for the category QF_SHLS_ENTL. 


Verification conditions (86) bolognesa+clones (210) 


Solver OK RO WIN «0.1 <1 Total time OK RO WIN «0.1 <1 Total time 


ASTRAL” 86 0 42 83 86 4.64 195 15 88 64150 408.48 
GRASSHoPPER 86 0 70 52 86 8.65 203 7 148 60 87 1229.35 
S28 86 0 5 86 86 2.08 210 0 3 203210 8.18 
SLOTH 64 3 86 0 28 23528 70 140 210 0 50 14942 


(b) Results for a subset of the category QF_SHLID_ENTL. 


Doubly-linked lists (17) Nested singly-linked lists (19) 
Solver OK RO WIN «0.1 <1 Total time OK RO WIN «0.1 <1 Total time 


GRASSHOPPER 17 0 16 3 15 7.53 - - - - - - 
HARRSH 17 0 17 0 0 95.18 14 5 18 0 0 183.01 
S2S 17 0 0 17 17 0.15 19 0 0 19 19 0.43 
SONGBIRD 11 5 14 5 9 13.39 HD S 8 4 11 1.38 


In Table [Tb] we provide results for a subset of the category QF SHLID ENTL (en- 
tailments with linear inductive definitions from which we selected DLLs and NLLs) 
for ASTRAL and three best-performing solvers competing in the latest edition of SL- 
COMP—S2S, SONGBIRD (in the version with automated lemma synthesis called SL S), 
and HARRSH. We also include GRASSHOPPER which supports DLLs only. Except 
S2S which solves almost all formulae virtually immediately, ASTRAL is the only one 
able to solve all the formulae in the given time limit. 


7.2 Experiments on Formulae Outside of the Symbolic Heap Fragment 


For formulae outside of the symbolic heap fragment and its top-level boolean closure, 
there are currently no existing benchmarks. For now, we therefore limit ourselves to 
randomly generated but extensive sets of formulae. In the future, we would like to 
develop a program analyser using symbolic execution over BSL and make more careful 
experiments on realistic formulae. 

We first focus on the fragment with guarded negations but without inductive predi- 
cates, on which we can compare ASTRAL with CVC5. We have prepared a set of 1000 
entailments of the form [E w which are generated as random binary trees with depth 8 
over 8 variables with the only atoms being pointer assertions. To reduce the number 
of trivial instances, we only generated formulae for which vars(/) C vars(y) and 
ASTRAL cannot deduce contradiction from their SL-graphs. To avoid any suspicion 
that the difference is caused by better performance of the backend solver rather than 
the design of our translation, we used ASTRAL with the CVC5 backend and direct set 
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(a) Comparison with CVC5 (b) Comparison with GRASSHOPPER 


Fig. 4: A comparison of ASTRAL with CVC5 and GRASSHOPPER on randomly gener- 
ated formulae. Times are in seconds, axes are logarithmic. The timeout was set to 60 s. 


encoding (with BITWUZLA and bitvector encoding, our results would be even better). 
The results are given in Fig. |4a| and suggest that our treatment of guarded negations 
really brings a better performance— ASTRAL can solve all the instances and almost all 
of them under 10 seconds. On the other hand, CVC5 timed out in 61 cases and is usu- 
ally slower than ASTRAL, in particular on satisfiable formulae which represent invalid 
entailments. 

In the second experiment, we compared our solver with GRASSHOPPER on the 
fragment which it supports, i.e., arbitrary nesting of conjunctions and disjunctions. We 
again generated 1000 entailments, this time with depth 6, 6 variables and with atoms 
being singly-linked lists (with 20 96 probability) or pointer-assertions. The results are 
given in Fig. [4b] ASTRAL ran out of memory in 5 cases, and GRASSHOPPER timed 
out in 10 cases. In summary, ASTRAL is faster on more than 80 46 of the formulae with 
an almost 3 times lesser running time. 

Finally, to illustrate that ASTRAL can indeed handle formulae out of the fragments 
of all the other mentioned tools, we apply it on an entailment query that involves the 
formula mentioned at the end of the introduction: ((sls(z, y) ^ ^(sls(z, z) *sls(z, y))) * 
y  z) H sls(z, z), converted to an unsatisfiability query. ASTRAL resolves the query 
in 0.12 s. Note that without the requirement —(sls(z, z) * sls(z, y)), the entailment does 
not hold as a cycle may be closed in the heap. 


8 Conclusions and Future Work 


We have presented a novel decision procedure based on a small-model property and 
translation to SMT. Our experiments have shown very promising results, especially 
for formulae with rich boolean structure for which our decision procedure outperforms 
other approaches (apart from being able to solve more formulae). 

In the future, we would like to extend our approach with some class of user-defined 
inductive predicates, with more complex spatial connectives such as septractions and/or 
magic wands, consider a lazy and/or interactive translation instead of the current eager 
approach, and try ASTRAL within some SL-based program analyser. 
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Abstract. Session subtyping answers the question of whether a program 
in a communicating system can be safely substituted for another, when 
their communication behaviours are described by session types. Asyn- 
chronous session subtyping is undecidable, hence the interest in devising 
sound, although incomplete, subtyping algorithms. State-of-the-art algo- 
rithms are formulated in terms of a data-structure called input trees. We 
show how input trees can be replaced by sets of traces, which opens up 
opportunities for applying techniques abstract interpretation techniques 
to the problem of asynchronous session subtyping. Sets of traces can be 
relaxed (enlarged) whilst still allowing subtyping to be observed, and 
one can choose relaxations that can be finitely represented, even when 
the input trees are arbitrarily large. We instantiate this strategy using 
regular expressions and show that it allows subtyping to be mechanically 
proven for communication patterns that were previously out of reach. 


Keywords: asynchrony, session subtyping, automata, abstract interpretation 


1 Introduction 


Protocols, which are used to communicate and orchestrate activity in distributed 
systems, are notoriously difficult to write and understand. Session types [23, 34] 
have thus been proposed for specifying protocol interaction and automatically 
checking whether an implementation conforms to its specification. Session types 
extend data types to describe communication behaviour, and express the be- 
haviour of units of design (sessions) in terms of which types of messages can 
be sent or received, and in what order. They have been integrated into main- 
stream languages and proved to be a powerful tool for static [25, 26, 28, 31, 32] 
and dynamic [1,2] verification as well as API generation [24, 30]. 


Session Subtyping A fundamental problem in the application of session types 
is checking whether the implementation of one component in a distributed sys- 
tem can be substituted for another, without violating an overarching protocol. 
This problem can be formulated as session subtyping |11,18,20,21], which is 
a preorder relation on session types: S’ is a sub-type of S, written S’ « S, if 
a program with type S can be safely substituted by a program with type S”. 
Consider S and S’ below: 


(9 The Author(s) 2024 
B. Finkbeiner and L. Kovács (Eds.): TACAS 2024, LNCS 14570, pp. 207—226, 2024. 
https://doi.org/10.1007/978-3-031-57246-3 12 
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?d la 
lb lb 
S': S: 
o -0 0 
?c ?c 


S and S' are expressed in automata notation where !a (resp. ?a) denotes a send 
(resp. receive) action on channel a. S models a process, which in state po, can 
repeatedly request a service a, or request b and then receive a confirmation c. 

The overarching protocol is defined, in a binary (client-server) session, as 
the parallel composition of S with its dual S, written S | S. The dual S is 
obtained by swapping each send action with a corresponding receive action and 
vice versa. Due to syntactic constraints posed by session types [23], S | S enjoys 
a number of key properties (e.g., deadlock freedom, communication safety). A 
process behaving as S can be safely substituted with another behaving as $$" 
that has less sends (e.g. the absent !a) and more receives (e.g. the additional ?d). 
This notion of substitutability is co-variant on send actions and contra-variant 
on receive actions, and preserves the key properties in protocol S” | S. 

We focus on asynchronous session subtyping (async subtyping for short) as 
asynchronous communications (over FIFO channels) are key in distributed sys- 
tems and languages such as Go and Rust. Async subtyping, however, is un- 
decidable [6,27] We focus on asynchronous session subtyping (async subtyping 
for short) as asynchronous communications (over FIFO channels) are key in dis- 
tributed systems and languages such as Go and Rust. Async subtyping, however, 
is undecidable [6,27] so the search is on for sound algorithms which are suffi- 
ciently robust to prove subtyping in the majority of cases. Given a candidate 
subtype and a supertype, the subtyping problem can be viewed as a simulation 
game in which the supertype is required to mirror any input and output action 
performed by the subtype. Since communication is asynchronous, the subtype 
can send early in the sense that the supertype can only realise the same output 
after some inputs. Consider M» below, which models a server producing a news 
feed (!b) on request from a client (?a), where M; is a candidate subtype for M»: 


7a ?a 
ZBOSOBONCLOSBO 
Ib Ib lb 


After receiving on a, Mə can immediately mimic the first send on b of Mı, 
but it can only perform the second send on b after receiving another request. 
The input a is said to guard the output b. One needs to reason about these 
dependencies to verify that M2 can follow the actions of Mj, albeit with (a 
possibly unbounded number of) send actions being delayed. This is the challenge 
of asynchronous subtyping. Apart from substitutability, asynchronous subtyping 
enables protocol optimisation in which receives are postponed, so as to minimise 
busy waiting for messages [29]. In M», if feed production was more efficient than 
request processing then it would be better if the server bundled feeds, as in Mj. 


$ 
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fs NR Ib 


So = {qo} 
Sig. = {a-m |m € Si} 


o Uiso Si 
S —SgUu {qi} 


Fig. 1. A simulation tree (left) and collecting simulation graph (right) for Mı and M» 


Existing techniques The state-of-the-art approach to async subtyping [4, 5] rep- 
resents a simulation game between the (candidate) subtype and supertype, in 
its entirety, with a simulation tree. The state of the supertype is modelled using 
an input tree [4,5,11,10], which records and accumulates input actions which 
guard outputs. Figure 1 gives the simulation tree for Mı and M2. Simulation 
commences at po < qo where Mı and Ms» are in their initial states po and qo. 
The edges in the tree follow the actions of Mi, with M» following along using its 
input tree. Step po < Tə models the scenario where M; is in state po but, in Ma, 
a second send on b is guarded by a receive on a. Input tree T5 = (a : qo) expresses 
this dependency by recording that M» can continue at qo, after performing the 
pending receive on a. As the simulation of Mı unfolds, however, the input trees 
for M» grow without bound, yielding an infinite simulation tree. 

Previous work [4,5] proposed a multi-step algorithm that computes a simu- 
lation tree until violation of a syntactic condition [5, Theorem 3.8] that is for- 
mulated in terms of the depth of input trees. The simulation tree is then divided 
into sub-trees, which are checked against a safety property [5, Definition 3.16]. 
The sub-trees are then used to generate systems of equations which are solved 
and checked against a compatibility condition [5, Definition 3.12]. The construc- 
tion is ingenious, but the length of the proofs [5, p. 14, p. 19-20, p. 22-26] begs 
the question of whether subtyping can be solved more simply. Furthermore, can 
a strategy be found that is amenable to independent algorithmic checking? This 
would explain why subtyping holds, further instilling confidence. 


Contribution Our development starts with the observation that an input tree 
can be represented, without loss of information, as a set of traces: one trace 
for each branch through the input tree. The rationale behind this encoding 
is that sets of traces can: (1) be relaxed (enlarged) and (2) be described as 
regular expressions. As to (1), a trace-based representation allows the subtyping 
algorithm to relax a set of traces to a strictly larger (possibly infinite) set, whilst 
still allowing subtyping to be observed. By covering all the sets of traces that arise 
in a simulation tree with a finite number of trace sets we can fold a simulation 
tree onto a graph to obtain a tractable (finite) representation. Regarding (2), 
(possibly infinite) sets of traces can themselves be finitely represented as regular 
expressions. For example, Figure 1 (right) shows a collecting simulation graph 
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where the states of M» are relaxed to the set of traces S and S’, which can be 
represented, say, as a*qo and a*qo + qı respectively. The result is a subtyping 
algorithm equipped with relaxation and termination machinery which can prove 
subtyping on more (and more complex) problems than existing methods. 

'The use of sets of traces separates the proof for correctness of the core al- 
gorithm, from the problem of how to finitely represent sets of traces. This sep- 
aration simplifies the theoretical development. If higher fidelity was required, 
regular expressions could be replaced with context-free grammars [13]; alterna- 
tively the relaxations employed with regular expressions (string widening [12]) 
can be tuned without revisiting the correctness of the core algorithm. 


Synopsis Section 2 introduces (session types as) communicating machines; Sec- 
tion 3 defines asynch subtyping with the formulation in [5] to facilitate com- 
parison and Section 4 gives a sound formulation based on collecting simulation 
graphs. Section 5 gives an algorithm based on regular expressions and widening 
over collecting simulation graph, and introduces and evaluates our tool. Conclu- 
sion and related work are in Section 6. 


2 Preliminaries on Communicating Machines 


Let A denote a finite alphabet, and A = {!,?} x A denote a finite set of send and 
receive actions. A communicating machine M = (Q, 40,9) (machine for short) 
is defined by a finite set of states Q, an initial state gg € Q, and a transition 
relation 6 C Q x A x Q. For a fixed machine M = (Q,q0,6), we write: qq 
iff (q,w,q') € 6; q= iff there exists q’ such that q = q'; qo —— qn iff there 


exist q1,---;Qn—1 € Q such that qi SETA dedi forO<i<n-1. 
Given a sequence of labels d = a1,..., a, and a direction x € {!,?}, we write 
xd for the sequence of actions xa4,...,xay. The maps inm : Q — g(A) and 


out : Q > (A) are defined: iny(g) = (a € A | a } and outy(q) = (a € 


A| q—> ). The predicate send m (q) holds iff outm (q) 4 0 and recv m (q) holds iff 
inac(q) Æ 0. The predicate finala (q) holds iff ssendys(q) and —recvas(q). 


Definition 1 (Session types correspondence). For a given M = (Q, qo, ô), 
M is deterministic iff (q,w,q1), (q,w,q2) € 6 implies qq = qo; M has no mixed 
states iff —send m (q) or arecvys(q) for allq € Q. A session type corresponds [19] 
to a deterministic machine without mixed states. 


Henceforth we focus on systems of two deterministic machines without mixed 
states, which correspond to binary session types. Binary session types describe 
two-party protocols (e.g., client-server as POP2, SMTP). State-of-the-art asyn- 
chronous subtyping algorithms [5] are formulated on binary sessions (each session 
involving two rather than many participants). We focus on demonstrating how 
abstraction can be applied to these algorithms and thus, likewise, adopt the 
binary setting. 


Asynchronous Subtyping by Trace Relaxation 211 


Because M is deterministic, the relation ó can be interpreted as a partial 
function Q x A — Q defined by ó(q,£) = q' iff q5 q'. Following [5] we introduce 
the predicate cycle;;(!, q) to aid the characterisation of orphan messages: 
Definition 2. The predicate cycley;(!,q) holds iff there exist à € A*,b € At 


and q! € Q such that q Rs q' and q' um q'. 


The predicate cycle, (!, q) thus holds iff from q one can reach, using a possibly 
empty sequence of send actions, a cycle (from q' to q’ itself) of send actions. The 
predicate cycle,;(?, q) is defined analogously. 


3 Asynchronous Subtyping with Input Trees 


We define input trees and asynchronous subtyping, adopting the formulation 
of [5]. Input trees are defined over the states Q of a supertype. Asynchronous 
subtyping is then defined in terms of input trees, the trees capturing input 
accumulation for guarded outputs. 


Definition 3. The set of input trees Tg over Q is the least set such that: (1) 


if q € Q then q € Tg; (2) if I is an index set, Vi € I.a; € A, ti € To and 
Vi, j E Ii #j — a *Xa;then(a;:t;|iel)eTo 


An input tree over Q is either a state in Q or an accumulated input. A term 
of the form (a; : t; | i € I) represents an accumulated input that presents an 
options a; for each à € I, followed by a tree t;. Note that any input tree of Tg 
is necessarily finite. The following definition shows how to build the input tree 
inTreem (q) for a state q of a given machine M, and defines the associated set of 
leaves leaf(t) of the input tree t. 


Definition 4 (Input tree). Define inTreey; : Q — Tg and leaf : Tg > p(Q) 


ER if cycle (?, q) 
inTreeys(q) = 4 q else if inyr(q) = 0 
(aj : inTreeyr(ó(q, ?a;)) | i € I) else inu(q) = (ai | i € I} 
|f ft) ifteQ 
eM, Se Taa pelee ein (een 


The cycle ą(?, q) condition (also used in [5]) ensures that inTreem (q), if defined, is 
finite. Note that a; : inTree m (qi) is well-defined in the above. To see why, suppose 
ó(g,a;) = qi. Observe that if ^cycley;(?, q) then —cycle,;(?, qi). Repeating this 
argument it follows inTreey(q;) Æ L, as required. 


Example 1 (Running example: input trees and leaves). The machines 14may2 
and 14mayl specified in Figure 2 originate from the GitHub repository which 
accompanies [5]. Henceforth let N; = 14may2 and N» = 14mayl. 


inTreey, (qo) = (a: (a: g2,€:q3),C: q5)  leaf(inTreew,(qo)) = (42.43, 45] 
inTreey, (q1) = (a : qo, c : q3) leaf (inTreen, (q1)) = (42,43) 
inTreey, (qi) = qi for all 2 < i < 6 leaf (q3) = {q3} 
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Fig. 2. Communicating machines N; (14may2) and N2 (14mayl) 


Next, we introduce a substitution 0 that we use, in the definition of asyn- 
chronous subtyping, to model the accumulation of inputs as simulation unfolds. 
Input trees are extended at their leaves by the application of a substitution 6. 


Definition 5 (Substitution). If qi € Q and t; € Tg for all i € I then 0 = 
{qi — t; | i € I} denotes an operator Tg — To where 0(t) is the input tree 
obtained by simultaneously substituting each occurrence of q; in t with ti. 


In Definition 6 we introduce the notion of an async subtyping relation be- 
tween states of a candidate subtype and input trees of a supertype. We follow 
[11] and, like [5], adopt the conventional orphan-free version of asynchronous 
subtyping [7, Definition 2.4] adapted to the setting of communicating machines: 


Definition 6. An async subtyping relation for Mı = (P,po,01) and M2 = 
(Q, qo, 62) is a binary relation R C P x Tg such that (p,t) € R implies: 


1. if finala, (p) then t = q for some q € Q and final m, (q) 
2. if recv y, (p) then 
a) if t — q for some q € Q then recvm, (q) and if q a q' there exist p yl 
and (p',q') ER 
b) ift = lai: ti | i € I) then for alli € I there exist p 5 pl and (p',t;) ER 
3. if send, (p) then: 
a) if t — q for some q € Q and sendy, (q) then if py there exist q dus q 
and (p',q') ER 
b) otherwise if leaf (t) = {q; | i € I} then 
i. =cycle yy, (!, p) 
ii. ti = inTreem, (qi) A L for alli € I 
iii. if pp and 0 = (qq |q€ Q,q a, q'} then leaf (ti) C dom(0) 
for alli € I and (p', &(t)) € R where k = {qi > 0(t;)) |i € I} 


Case (1) is self-explanatory. Case (2) is for input actions in Mı and realises 
contra-variance with respect to inputs. Case (2.a) applies when the states p 
and q are in sync, whereas case (2.b) applies when an accumulated input a; 
in M» is consumed by a corresponding input action of Mı. In case (2.a), con- 


dition recvyy,(q) ensures that the guarded clause "E does not hold vacu- 
ously. Case (3) is for output actions in M, and implements output co-variance. 
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Case (3.a) applies when M; and Ms» are in sync, while case (3.b) is for accu- 
mulated inputs. The negated cycle,;, of clause (3.b.i) predicate mirrors [5] and 
prevents orphan messages, ensuring that accumulated inputs are eventually con- 
sidered. Clause (3.b.ii) was implicit in [5] but is used in the proofs for structuring, 
and is thus made explicit. Clause (3.b.iii) ensures that if p in M; can send, then 
every leaf of the corresponding input tree t in Mz can make a matching send 
action. 


Definition 7 (Async Subtyping). M; = (P, po, ô1) is an (async) subtype of 
Ms» = (Q,qo,02), written Mı < Mo, iff there exists an async subtyping relation 
R € P x Tg for Mı and M» such that (po, qo) € R. 


4  Asynchronous Subtyping with Input Traces 


Simulation trees [b] provide a foundation for checking subtyping, but because 
their branches can grow arbitrarily long, they are not tractable in themselves. 
To obtain a model which is amenable to abstraction, we substitute an input tree 
with a set of input traces. Sets of input traces can be easily relaxed by adding 
more input traces, which is key to deriving a finite alternative representation. 


Definition 8 (Input Traces). Given a fixed alphabet A and a set of states Q, 
input traces (traces for short) are words formed from the alphabet A (which are 
ranged over by T) followed by a state in Q: Trg = (n-q|m € A*,q E Q}. The 
empty word is denoted c. 


The development begins by lifting a simulation tree to sets of traces, a construc- 
tion which itself requires some set-level auxiliary operations: 


Definition 9 (Traces of an input tree). The set of traces of an input tree is 
given by the map tr : Tg U{L} > (Tre) defined by: 


Øift=L 
tr(t) = {t} itceQ 
{aion |m € tr(t;),a € I) oft = (a;:t;i| ie I) 


Example 2 (Running example: traces). Continuing with N; and Nə of Example 1 
(Figure 2), tr(inTreew,(qo)) = (aaq», acqs, cqs } and tr(inTreew,(q1)) = (aq2,cq3]- 


4.1 Collecting simulation 


A (collecting) simulation tree is formulated in terms of a (collecting) simulation 
relation, defined below. The term collecting has been chosen to resonate with 
abstract interpretation [15] where a semantics is lifted to operate on sets of 
data points (to give a so-called collecting semantics) which provides a semantic 
substrate for synthesising an algorithm. 
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> ! 
TA Ci ?a. oy C la. 
inm (q) C inm (p — a—54 [Recv] outu,(p) C outu,(g |^ p—?p [Send] 


7a la 
p < q—> ĝm; (p, ?a) <q’ p<q => p' < ôm (q, !a) 


^cycle,; (!, p) 
tr(inTreew,(qg)) = (ói- qi] ic I) kel 


la 
a € inm: (p) Vi € I : outm, (p) Coutu,(di) pop’ 


a [RecvTr] A 
p <a: n= ĝm: (p, ?a) < T pX6ó$:qop <o: ork: õm (ar, !a) 


[SendTr] 


YrES:IDEA:p< r Sa = {r | rE Sp rp xa) Z0 


2 z z [RecvSet] 
p Sop’ < Sa 


la la 
YWrES:pLlr- Sa = {r |T ES, p< «mv 
T pire (n m PSTOP $T) v ondSet 


la 
PIIP < Sa 


Fig. 3. Rules for trace-based asynchronous subtyping 


Definition 10 (Collecting simulation). The collecting simulation relation of 
two machines M, = (P, po, ô1) and Mz = (Q, qo, 2) is the least 5-place relation 
— € P x p(Tro)x Ax P x p(Tro), satisfying the rules in Figure 3, where 


t 
p< Sop’ < S' abbreviates (p, S, ,p', S") E >. 


In Figure 3, rules Recv and RecvTr collectively realise the second case of Defini- 
tion 6: rule Recv realises case (2.a) for interactions in sync, and RecvTr realises 
case (2.b) that consumes an accumulated input. The contra-variance of receive 
manifests as iny,(q) C inm, (p) in Recv and a € inm, (p) in RecvTr. Rules Send 
and SendTr realise case (3.a) and case (3.b) of Definition 6, respectively. In 
these rules, the co-variance of send appears as premise outm, (p) € outm, (q) in 
Send and Vi € I : outm, (p) € outas, (qi) in SendTr. In rule SendTr, the leaf(t;) C 
dom(0) condition in case (3.b.) follows from the premise out m, (p) C out m, (qi) for 


all i € I. To see this, let q € leaf(t;) for some j € J. Since pyp, a € outy (p) 
thus a € outm, (q) therefore q € dom(0). 

The absence of mixed states (Definition 1) ensures that if both Send and 
SendTr are applicable then the traces which result coincide. The force of this is 
that clause ‘otherwise if ...’ of Definition 6(3.b) can be simplified to ^f ...’ (so 
there is no need to prioritise the application of Send over SendTr). The current 
formulation of Definition 6(3.b) was chosen to align with that used in [5]. 

Rules RecvSet and SendSet lift subtyping from traces to sets of traces. In 
RecvSet, the first premise specifies a covering requirement: that a receive is pos- 
sible for each trace of S. The second premise prescribes a grouping requirement: 
for a given receive action ?a, the second precondition accumulates all those traces 
which can be derived by receiving on a. The requirement S, 4 () ensures that 
a non-empty subset of S contributes to S,. The S, 4 () requirement, which 
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likewise shows up in SendSet, also inhibits meaningless transitions of the form 


pz e p xÜüandpz “4 p' < 0, which would otherwise hold vacuously. 
For any given p < S, relaxing S to T', can result in either p € T' becoming 
stuck, or a move that preserves the inclusion of traces. To formulate this property, 


D 
let p< T h denote the absence of a transition of the form p € T p' < T. 


Proposition 1 (Monotonicity). Let T C S C Tro and L € A. Then if 
p< Tp! < T' either: p < S or p X Sap < S' where T' C S'. 


4.2 Collecting simulation trees and graphs 


First, we provide an infinite model for collecting simulation using collecting 
simulation trees, that is an alternative presentation of simulation trees [5] where 
we represent the state of a supertype as a set of traces rather than an input tree. 


Definition 11 (Collecting simulation (sim) tree). A collecting sim tree for 
Mı = (P,po,91) and Mz = (Q,q0, 52) is a labelled tree (N, mo, SE where 
d CN x N is a tree rooted at ny and L : N — P x g(Trg) such that: 


1. £(no) = (po, (403) 

2. ifp< Say < S' and L(n) = (p, S) then n Ey n' for some n' € N such 
that L(n’) = (p', S") 

3. ifn A n' and L(n) = (p, S) then L(n’) = (p', S") such that p < S y Es 

Case (2) above ensures that a collecting sim tree enumerates all the transitions 


£ £ Sis 
of — whereas case (3) ensures that the tree only enumerates — transitions. 
Note that a collecting sim tree is unique up to tree isomorphism. 

Theorem 1 shows that subtyping can be expressed in terms of successful 
branches (Definition 12) of collecting sim trees. 


Definition 12 (branches). A branch of a collecting sim tree (N,no, 4, ,£) 


is a (possibly infinite) sequence no,nı,... C N such that ni e nti for all 
consecutive n;,ni41. A complete branch of the collecting sim tree is a branch 
which is not a strict prefix of another branch of the collecting sim tree. A suc- 
cessful branch is a complete branch which is either infinite or whose last node n 
is labelled C(n) = (p, F) with F C Q, final, (p), and finalas,(q) for all q € F. 


The concept of successful branch allows for F to include multiple final states. 
This degree of generality supports supertypes with two or more final states (such 
as q4 and qe of the machine N2 of Example 1) when, later, successful branches 
are deployed in the context of collecting simulation graphs (see Figure 4). 


Theorem 1 (Equivalence). Let (N,no, es ,C) be a collecting sim tree for 
Mı = (P,po,01) and Mz = (Q,qo,02). Mi < M^» iff every complete branch in 


(N, L) is successful. 
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Simulation trees and collecting simulation trees can grow without bound. 
However, growth can be curtailed by the judicious application of relaxation: 


Definition 13 (Collecting simulation (sim) graph). A collecting sim graph 
for Mı = (P,po0,61) and Mz = (Q,qo,62) is a labelled graph (N,no, e os) 
where 5, C Nx N is a graph rooted at no and C: N + P x (To) such that: 


1. £(no) = (po. (403) 
2. ifp < Sy < T and L(n) = (p, S) then there exists n € N such that 


n es n’, £(n') = (p', S") for some S' DT 
3. if n ha n' and L(n) = (p, S) then L(n’) = (p', S") such that S' D T and 
psSSp<T 


Relaxation manifests in case (2) of Definition 13 in that S" D T: S' is thus a 
relaxation of T. Note too that n’ is not necessarily on the branch from no to n. 
Case (3) ensures that each transition in a collecting sim graph has a counterpart 
in the collecting sim tree. 

The concepts of (complete and successful) branch can be defined analogously 
for a collecting sim graph. With these concepts in place, the following result, 
which is consequence of Proposition 1, explains how a collecting sim graph sim- 
ulates a collecting sim tree: each branch in the tree is described by a branch 
in the graph with possibly enlarged trace sets. This correspondence between a 
branch in the graph and a branch in the tree only holds if the branch in the 
collecting sim graph does not get stuck. 


Corollary 1. Let (N,no, d) (resp. (N', no, $g, L’) be a collecting sim 
tree (resp. graph) for Mı = (P,po,01) and Mz = (Q, qo, 62). If b = no -ni 


e 
is a branch in the tree (N, —4) then there exists b' = n$--- nj, in the graph 


(N’, 4.) with either: k =i ork < à and nj, hy. Moreover, L(n;) = (pj, 9j), 
£' (n5) = (pj, $;) and S; € S; for all j < k. 


Example 3. Figure 1 (left) shows an infinite simulation tree (following the nota- 
tion of [5]) for machines M; and Mə given in the introduction. The corresponding 
collecting sim tree has the same structure but Tz = (a : qo) is substituted with 
{aqo}, T3 = (a : (a: qo)) with {aaqo}, whereas qo and qı (at and beneath the 
root of the tree) are replaced with (go) and {q1} in the collecting sim tree. A 
(finite) collecting sim graph for M, and Mg is shown in Figure 1 (right). Observe 
do € S, qı € S', qo € S, aqo € S, aqo € S', aaqo € S, aqo € S, qo € S', etc. 


'The force of collecting sim graphs is that they still act as a vehicle for estab- 
lishing asynchronous subtyping, as the following result asserts: 


Theorem 2 (Soundness). Let (N', nh, 5, ,C) be a collecting sim graph for 
Mı = (P,po,01) and Mz = (Q,qo,02). Then Mı < M» if every complete branch 


£ 
in (N', ,) is successful. 
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So = {qo} 

Sı = {q1} 

S4 = (ó- T | 6 € Uiz0Ai, T € (90,033; 035] ) 
Ss = {qi} U Sa 

Ss = {q3, 95} 

So = (94, q6} 


where Ao = fc] and Ai+1 = {a-m | € Aj} 


Fig. 4. A collecting sim graph for Ni and N2 


Example 4 informally anticipates how finite representations of infinite execu- 
tions can be algorithmically computed (using regular expressions) ahead of the 
detailed presentation and evaluation of the algorithm in the following sections. 


Example 4 (Running example: collecting sim graph). Continuing with Example 1 
(Figure 2), Nı and Nz are examples of machines for which [5] cannot prove 
subtyping, even though it does hold. In contrast, Figure 4 presents a collecting 
sim graph showing N, < No. The graph is rooted at ng where £(ng) = (po, So). 


5 Async Subtyping with Regular Expressions 


Our work was motivated by the question of whether subtyping can be addressed 
with a simpler and more general approach. Beyond this conceptual question, 
there is the practical matter of whether our subtyping can algorithmically es- 
tablish subtyping on more problems than before [4, 5]. To do so, we represent sets 
of traces using regular expressions and simulate the operations on sets of traces 
with analogous operations on regular expressions. To derive a finite collecting 
sim graph, we apply regular expression widening [12]. 


5.1 Representing sets of traces with regular expressions 


A set of traces can be represented as a finite set of regular expressions drawn 
from the syntactic category Reg 4 which is parameterised by alphabet A. Reg, 
is inductively defined as Reg, = € | C | r-r’ | r* where C C A, r,r’ € Regy, and 
- is concatenation of words. To specify the language (set of words) represented 
by a regular expression, recall that Kleene closure W* of a set of words W is 
defined as W* = U;29W; where Wo = {e} and Wi41 = {w-w’ | w € W,w' € Wi. 
Then the language of r € Reg,, denoted [r], is defined as [e] = fe), [C] = C, 
Ir-r] 2 {w-w’ | we [r] »' € [r’]} and [r*] = [r]*. 

If r € Reg, and q € Q the pair (r,q) represents the sets of traces [(r,q)] = 
{r -q | n € [r]}. Furthermore, if R C Reg, x Q then R represents the traces 
[R] = U{l(7, g)] | (r, q) E€ R}. Henceforth rq will abbreviate the pair (r, q). 
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Example 5. To illustrate, [{a*qo,cq3}] = (caa) U {7 : qo | T € UizoA;) with A; 
defined as in Figure 4. 


Our technique uses the existing notion of widening [15, 16] to approximate 
regular expressions, namely to relax a sequence of regular expressions to derive 
another sequence which is not strictly increasing (thereby inducing convergence): 


Definition 14. An operation V : Reg, x Reg, — Reg, is a widening iff given 
a sequence $9,81,... € Reg, such that [si] C [sisi] for all à > 0, the (widened) 
sequence Wo = sg and wi41 = WiVSi41 satisfies the following properties: 

— [si] € [wi] and [w;] € [wisi] for alli > 0 

— the sequence [wo], [wi],... is not strictly increasing 
Our approach is parametric on the widening (of which there are many [14]). We 
provide a primer on (string) widening to keep the presentation self-contained. 


5.2 Widening regular expressions (a self-contained primer) 


The intuition behind the widening we adopt [12] is to preserve commonality 
across two regular expressions and resolve any difference using Kleene star for 
relaxation. The widening scans both expressions left-to-right and, as it does so, 
it partitions each expression into a prefix p which has been traversed and a suffix 
s which is yet to be considered. The state of the scan thus represented by a pair 
(p, s), with widen, operating on two such pairs simultaneously: 


widen; ((p, €), (p', s')) = mashe (p, p’ - s") widens ((p, s), (p', €)) = mashz (p - s, p’) 


widen; ((p, q- s), (p, q^ - s')) = 
mash; (p, p') o q o widenz ((e, s), (e, s')) if q = q' and sh(q) € k 


widen;. ((p - q, s), (p'- is 8^) if q = q' and sh(q) > k 
wideni ((p - q, s), (p, q' - s')) if q Z q' and |s| > |s’| 
widen: ((p, q - s), (p' - P s^) if q z- q' and |s| < |s'] 


The widening is defined in terms of two notions of size: (1) star height defined 
sh(e) = sh(C) = 0, sh(r*) = sh(r) + 1 and sh(r- s) = max(sh(r),sh(s)); (2) 
star length defined |e| = 0, |C| = |r*| = 1 and |r -s| = |r| + |s|. Given two 
expressions r and s, the auxiliary mash; (r, s) computes a relaxation of r and 
s such that sh(mash;z(r,s)) < k where k is a predefined depth bound. Thus 
[r] € [mash; (r, s)] and [s] € [mash;(r, s)]. 

Now consider scans of the form (p,q- s) and (p',q' - s') where q and q’ are 
sub-expressions of the form C or r*. If q — q' then the common q is preserved 
provided sh(g) € k and widening continues with scans (e,s) and (e,s). Op- 
erator o is concatenation followed by a normalisation step [12] which ensures 
that no consecutive stars are introduced. If sh(g) > k both q and q’ are ap- 
pended onto r and r’ to be relaxed subsequently by mashg. If q 4 q' either 
q or d' is appended onto its prefix depending on |s| > |s’| so that the re- 
maining suffices are closer in length (which is merely a heuristic for improv- 
ing their similarity). Analogous to mashg, widen; ((p, s), (p’, s’)) relaxes p - s and 
p’ +s’ such that sh(widen;((p, s), (p', s'))) < k. The star height bound ensures 
rVs = widen; ((e, r), (e, s)) yields a sequence which is not strictly increasing [12]. 


4 . 
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Algorithm 1 Algorithm for async subtyping (4 is defined in Figure 3) 
1: function SuBTYPE(Mi;, M2, A) // Mi = (P,po,à1) Ma = (Q, qo, 02) 
2: for (p € P) do 
if (A(p) #0 ^ p< A(p) A) then return maybe 
= D 
Rp :— Uprep{R | 3Cp' € A(p) op < R} 
A (p) := if (p € wp) then A(p) V Rp else A(p) U Rp 
if (A' C A) then return A 
return SUBTYPE(Mi:, M2, A’) 


Example 6. For brevity, we refer the reader to [12] for a definition and commen- 
tary on the auxiliary mash; (r, s) but note that mash;(r,¢) = r* if sh(r*) < k 
and conversely mash,.(e, s) = s* if sh(s*) < k. Hence 


(a: c-d)v(a-b*c) = wideni((e,a- c- d), (€,a - b- c)) = € - a - wideni((e, c- d), (e, 6 c)) 
= e-a- mashi(e, b) - c - wideni ((e, d), (e, €)) 
=e-a-b*-c-mashi(d,e) = e:a-b* - c. d* 


The widening can be lifted from a pair of regular expressions to a pair of sets of 
regular expressions in a point-wise fashion [12]. In our setting, regular expressions 
represent traces, where each trace takes the form rq, and thus it is natural to 
partition a set of traces according to the state q in which they end. Two sets of 
expressions can be widened point-wise, for each q separately. 


5.3 Computing a collecting sim graph with regular expressions 


Before outlining the algorithm, we illustrate it by example. Example 7 revisits 
Example 4 and shows how the sets of traces in Figure 2 can be algorithmically 
generated by using regular expressions and widening in tandem. 


Example 7. Figure 5 presents a collecting sim graph for N4 € N2. Some nodes 
are shadowed by grey nodes that elaborate their relaxations by widening or 
union. The construction of the graph commences at node for po < Ro and pro- 
ceeds iteratively, the number to the top-right of a node indicating the iteration 
at which that node is added to the graph. Iteration 1 is computed merely using 
the rules of Figure 3. On iteration 2, po < Re is computed, again using the rules. 
Since po was visited before, to ensure that po is not revisited ad infinitum, a 
relaxation is applied, denoted V following [15,16], which relaxes Rə using Ro to 
obtain R5. Observe how [Ro] € [£5] and [Re] € [R3] but crucially the regular 
expression R5 is computed using a (widening) algorithm [12] which ensures that 
only a finite number of regular expressions are ever generated for po. Not all 
nodes of Figure 5 need to be relaxed using widening. On iteration 3, pı is revis- 
ited. In this case, R} is derived from Rg and Rı by computing their union. Thus 
again [Ri] € [R5] and [Rs] € [R5]. The general strategy is to apply widening 
only as required, namely on a set of nodes which cut any cycle [3]. The machine 
N, of Figure 2 has a single cycle through po and p1, thus it is sufficient to widen 
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Ro={qo} Ha-íaqo.cqs) R3={a* qo} Ria={a* qo, a* eqs, a*cqs) 
R5—1a* qo, cq3} R3={a*qo, q1} Rə={q4, q6} 

Rı={q} Rz={q6} Rs={q3} 

Ro-—1as) Rs={q3, q5} Rs={a*qo, qi, a*cq3, a” eqs) 


Fig. 5. A collecting simulation graph for proving Ni < N2: reprise 


at either po or pı. We elect to widen at po, whereas for all other nodes of N2, the 
relaxation is union. On iteration 5, pı = Rs is computing as before, the union of 


R with Rs being Rs. The following at JENGSIMOR derives a regular expression 


R which is subsumed by R4, that is, pı < Rs 4 po < R where [R] € [R4]. Thus 
the graph is no longer developed along the cycle. Despite employing relaxation, 
Ry only contains q4 and qe for which finaly, (q4) and finaly; (qe) hold. Recall 
finaly, (p3) holds, hence subtyping is demonstrated. 


Our SUBTYPE algorithm takes as input two machines Mı = (P, po, ô1) and 
M» = (Q, qo, 02) and is parametric on: (1) a widening V : p(Reg4)xgp(Reg4) > 
(Reg 4) and (2) a set wp C P of widening points. At least one state of wp must 
appear in any cycle of M1; a condition which is sufficient for widening to induce 
termination [3]. The mapping A : P > p(Reg4 x Q) represents the nodes of an 
evolving collecting sim graph: SUBTYPE(M1, M2, A) is initially primed with A = 
Ap. if (p = po) then {(e,qo)} else 0. In line 3, maybe is returned if the simulation 
gets stuck. Note that p < Ry < R' abbreviates p < [R] yl < [R'] and 
likewise p < RS abbreviates p < [R] 5 . In line 4, Rp collects all the (r, q) pairs 
reachable at p in the current iteration A. A(p) is then relaxed to A'(p) applying 
widening if P € wp and union otherwise. In line 5, A’ C A iff [A(p)] € [4 (»)] 
for all p € P. This check determines whether a fix-point is reached: if so the 
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Mi M» |Mi| | M3|'[5] regex time 

ctxtal ctxta2 7 5| X / 110 

ctxtb1 ctxtb2 6 T| X V 4l 
14may2 14mayl 4 7| X y 10 
badseq1 badseq2 5 12 x v 1127 
march3testal march3testa2 6 7| X / 222 
aaaaaabl aaaaaab2 5 3| X V 43 
exlokloop ex2okloop 10 8| X v 1757 
march3testal march3testb2 6 10 X X 8 


Fig. 6. Comparison of subtyping experiments: success rates and execution time (in ms) 


algorithm returns A. SUBTYPING is sound and, due to widening, is guaranteed 
to terminate. In short, if SUBTYPING returns A then Mı < Mo, otherwise it 
returns maybe and the subtyping check is deemed inconclusive. 

For complexity, observe that wp can be chosen so that each state of P \ wp 
has at most one incoming edge. Then algorithm 1 updates each state of P at 
most (c|Q|)!”?! times, updating A at most |P|(c|Q|)!”?! times, where c bounds the 
number of times a regular string can be relaxed. But c € (2|Q|)F? . 


5.4 Implementation and benchmarking 


If successful, our tool generates a collecting sim graph (in the form of A) which 
provides a concrete artifact that certifiess u btyping. T heregulare x pression- 
based subtyping algorithm has been implemented in Scala 3.2.2 on a laptop 
running Ubuntu 22.04.2 with 32 GB of DDR3 and a 2.8GHz Intel i7 processor. 
'The code base is 1059 LOC, making use of parser combinators and the mutable 
and immutable Set libraries. No attempt has been made to improve the iteration 
strategy (which is normally a source of speedups). The tool and benchmarks 
are available at https://github.com/murgia88/AsynchSubtypingRegex. The 
benchmarks? consists of 175 pairs of session types: 83 pairs where one type is 
known to be a subtype of the other (the positive problems); and 92 pairs which 
are known not to be in a subtyping relation (the negative problems). 

This is a positive outcome. Alternatively, the algorithm terminates with an 
inconclusive verdict. We have applied our tool to all the subtyping problems in 
the benchmarking suite. Our tool gave positive outcome for 82 of them, whereas 
the tool in [4, 5] gave 75 positive outcomes. In addition to certifying all positive 
cases in [4,5], the tool could certify 7 “complex accumulation [input tree] pat- 
terns” [5] that were inconclusive cases in previous work. All 92 negative problems 
were (rightly) categorised as inconclusive by our tool. 

An analysis of the 7 complex accumulation patterns is summarised in Fig- 
ure 6. The M, (resp. M2) column give the candidate subtype (resp. type). To 


3 The suite is based on the benchmark in [4, 5] with the addition of one (positive) case 
that is used in [4,5] as a running example. 
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convey some indication of the size of the problems, the |M] (resp. ||M3]) column 
gives the number of states in M; (resp. M3). The [5] column indicates whether 
subtyping can be proven using the algorithm of [5] using their distribution. The 
regex column indicates whether subtyping can be proven using collecting sim 
graphs instantiated with regular expressions, as proposed in our work. Time is 
walltime measured in milliseconds, the median of 5 runs. Widening was per- 
formed with a maximum star height of just 1 (k — 1). The last example in 
Figure 6, marchtestal < marchtestb2, is known to be positive but neither our 
tool nor the one in [4,5] could prove it. Nevertheless, it is remarkable that the 
widening of [12] performs so well considering it was originally devised for ex- 
tracting SQL queries from database application programs. 

The certificate produced by the algorithm (in the form of A) can be checked 
against the rules of Figure 3, without using widening or iteration. This could 
conceivably be performed by a proof assistant for high-assurance applications. 

We finally comment on one complex example, marchtestal < marchtestb2, 
that neither our tool nor the one in [4,5] could prove. A post mortem reveals 
that p4 < S4 gets stuck: traces of S4 of the form brqs cannot make any move thus 
RecvSet does not apply. However, brq originates from (a, b}*q3 in po € So which 
itself stems from (eqs V1 aq3) V1 b(eqa Vi aqa). Setting k = 2 (or higher) does not 
remedy the problem, which suggests that the widening needs tuning. Indeed, 
replacing (a, b}*q3 in So with a more nuanced relaxation, namely (a*(ba)*a*)* qs, 
is sufficient to establish subtyping. Crucially, this shows that the problem does 
not lie in collecting sim graph construction itself but in the widening (something 
which can be tuned without change to the underlying framework). 


6 Conclusion and Related Work 


We presented an algorithm for (binary) asynchronous session subtyping based on 
the application of abstract interpretation to session types. Our approach centres 
on the use of sets of traces to obtain a tractable representation of input trees. 
Sets of traces allow us to separate the proof for correctness of the core algorithm, 
from the problem of how to finitely represent and manipulate traces. This sep- 
aration makes the methodology modular and tunable. As well as providing a 
conceptually simple approach for proving subtyping, the resulting algorithm, 
when instantiated with an off-the-shelf string widening, can prove subtyping for 
rich forms of interaction that were previously out-of-reach [5]. From a large suite 
of benchmarks, our algorithm was able to verify subtyping all but one problem 
and, even for that, we have shown that the collecting simulation approach is still 
adequate for proving subtyping. These results show that abstract interpretation 
is a clean, useful and powerful vehicle for inferring subtyping. Furthermore, a 
collecting sim graph once obtained constitutes a certificate for validating subtyp- 
ing. The certificate can be then checked by a third-party, without consideration 
for how the graph is actually derived (whether algorithmically or manually). 


Related work Async subtyping was first explored in [29] where subtyping rules 
consider a restricted form of permutation on actions. These concepts were then 
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refined [10, 11] to disallow orphan messages, a requirement adopted in [5] and 
inherited into our study for ease of comparison. 

Since async subtyping is undecidable [6, 27], some works proposed decidable 
safe approximated algorithms. For instance, subtyping can be approximated by 
k-bounded asynchronous subtyping [7]. The state of the art is [4, 5] that inspired 
our work. Fragments of session types for which asyn subtyping is decidable in- 
clude: alternating session types [7] and single-out (resp. single-in) types [7] where 
internal (resp. external) choices are singletons. 

Fair subtyping [9, 33] is an alternative to standard subtyping that preserves 
the possibility of correct termination. Asynchronous fair subtyping [8] is unde- 
cidable, and a sound algorithm has been proposed [8], which extends [5]. We 
would expect trace relaxation to extend to this setting as well. 

The work above mostly focuses on binary sessions. The subtyping algorithm 
of [17], instead, focuses on the more general case of async multiparty subtyping. 
When restricted to binary types, the algorithm in [17] is less powerful than both 
[5] and our algorithm. The last case of [17, Table 1], taken from the running 
example in [5], is undetected with deadlock-free subtyping [17] but is proven 
by [5] and ourselves (see case ‘sub — runningex < sup — runningex’ in https: 
//github.com/murgia88/AsynchSubtypingRegex). [17] is still able to establish 
subtyping for several realistic protocols. A precise definition of async multiparty 
subtyping (AMS) has been provided in Ghilezan et al. [22]. This means that 
AMS in [22] is sound and complete with respect to async multiparty typing 
with a subsumption rule. Such definition is not obviously useful for algorithmic 
purposes: it contains quantifications over uncountably infinite sets. Application 
of our methodology to AMS is an interesting future direction. 
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Abstract. Since its inception two decades ago, SOOT has become one of 
the most widely used open-source static analysis frameworks. Over time 
it has been extended with the contributions of countless researchers. Yet, 
at the same time, the requirements for SOOT have changed over the years 
and become increasingly at odds with some of the major design decisions 
that underlie it. In this work, we thus present SOOTUP, a complete reim- 
plementation of SOOT that seeks to fulfill these requirements with a novel 
design, while at the same time keeping elements that SOOT users have 
grown accustomed to. 


Keywords: Static program analysis - Soot - SootUp. 


1 Introduction 


SOOT is a program analysis framework for Java and Android. It has been pop- 
ular in academia for prototyping novel static and dynamic analysis approaches, 
many of which have been published at international conferences [I][3[5l|6] [14] 
[ipai] pe3l [29]. In 2000 [30], SOOT was introduced as an optimization frame- 
work for Java. Back then, when just-in-time compilers were still in their infancy, 
ahead-of-time optimization of Java code was a major field of research. Over the 
years, the research community's interest has been dominantly shifting to static 
code analysis, for diverse purposes. SOOT remained relevant due to some of its 
strengths, particularly its popular intermediate representations. 

One of the core features of SOOT is its main intermediate representation 
(IR), JIMPLE [BI]. When seeking to perform program analysis on Java, both 
bytecode and source code are usually suboptimal representations to work with. 
Java bytecode represents a program to be executed, using a stack-based instruc- 
tion set. Java source code, on the other hand, represents it on a higher level, using 
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nested scopes and control-flow constructs for better readability. Soot's JIMPLE 
IR is a so-called three-address code representation that combines the best of 
both worlds: It uses local variables instead of a stack. This simplifies data-flow 
equations because all values that an operation consumes or produces are readily 
accessible through its operands. It also uses explicit control flow without nesting, 
i.e., solely through conditional or unconditional gotos. In result, every JIMPLE 
instruction is atomic, there can be no nesting. Complex source-code statements, 
which perform multiple consecutive operations, e.g. a numerical computation 
with a subsequent cast, are broken down into multiple individual IR. instruc- 
tions. This enables the creation of simple control flow graphs (CFGs), which one 
can then use to analyze a method's control and data flow with relative ease. 


Furthermore, SOOT offers multiple algorithms, with varying degrees of pre- 
cision and complexity, for constructing call graphs. They resemble an essential 
data structure for performing inter-procedural static analysis, as it models how a 
program's methods call one another. For object-oriented programming languages 
like Java, call graph construction is particularly challenging. This is because in 
Java method calls are virtual by default, in which case their call target is de- 
pendent on an object’s runtime type. A reference variable's declared type can 
only bound the possible call targets. To resolve call targets precisely one must 
compute all of the variable's possible runtime types. A popular way to do this is 
through pointer analysis. SOOT provides such call graph computation through 
its pointer analysis framework SPARK. 


Over the years, SOOT has frequently been extended to incorporate new fea- 
tures, and, in doing so, even early on it became clear that some of its design 
decisions were suboptimal, yet hard to remedy after the fact. For instance, SOOT 
has always been all-round monolithic. It heavily uses the singleton design pat- 
tern, causing strong coupling, and it always sought to be both a command line 
tool and a library, causing sometimes conflicting views on who owns the thread 
of control. In SOOT, everything can be accessed and manipulated via the single- 
ton "scene". This forbids keeping multiple scenes in memory, and any sensible 
parallelization. SOOT also contains many features that by now are considered ob- 
solete, e.g. other barely used IRs and an outdated source-code frontend, which 
are hard to remove without breaking useful but untested functionality. 


This paper presents SOOT’s successor framework SOOTUP. With SooTUP, 
we aim to keep the most important features of SOOT, yet to also overcome its 
major drawbacks. We designed SOOTUP as a modular library. This allows one 
to pick out the necessary modules for a specific use case. For instance, clients 
that only require bytecode analysis would add a dependency to the bytecode 
frontend module. This is possible due to SOOTUP’s core module being a generic 
implementation that allows plugging in frontends for arbitrary programming 
languages. Instead of a singleton scene object, SOOTUP introduces the concept 
of views, where each view may hold a different version of the analyzed program, 
or different programs altogether. To enable safe parallelization and caching, the 
new JIMPLE IR is immutable by default, allowing instrumentation only at certain 
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safe points. At the time of writing, SOOTUP's most recent release is v1.1. Fland 
SOOTUP is open-sourced at GitHubP] 
To summarize, this paper presents the following contributions: 


The design decisions behind SOOTUP’s architecture that accommodate cur- 
rent research requirements, 

— a demonstration of its new API, which aims for better usability, 

— suggestions for SOOT-based analysis tools on how to switch to SOOTUP, and 
the roadmap for further development of SooTUP. 


'The remainder of this paper is organized as follows. In Section [2] we introduce 
the design decisions that shaped SooTUP. In Section[B3] we demonstrate the new 
API on example use cases. In Section H] we list currently supported tools and 
discuss how to upgrade tools to use SOOTUP. In Section[5] we explain SOOTUP’s 
development process and how one can contribute to it. We present the future 
work in Section [6] related work in Section [7|and conclude with Section B] 


2 Design Decisions 


We next discuss the main design decisions that underly SOOTUP, and how they 
address some of the major shortcomings of SOOT. We introduce the new archi- 
tecture and excerpts of the new API. 


2.1 Modular Architecture 


SOOTUP's most notable architectural difference from its predecessor is the clear 
separation of its components into independent modules. Figure [1|shows its archi- 
tectural overview. One of the goals of the new architecture is to allow SOOTUP 
to be used as a language-independent static analysis framework. It is not tightly 
coupled to any programming language. The most recent release (v1.1.2) in- 
cludes frontends for Java bytecode, Java source code and a now generic, i.e., 
language-independent form of JIMPLE. We delegate the language support to ex- 
ternal frontend providers and expect them to extend the generic JIMPLE. This is 
a significantly different mechanism than SOOT had offered for language support 
before. Previously, to analyze programs not in Java, one needed to convert their 
code to the (Java-specific) JIMPLE. With SOOTUP, instead one defines language- 
specific features by extending the core set of JIMPLE language constructs. 

The core module encapsulates the main functionality based on the generic 
JIMPLE. It defines the JIMPLE language constructs such as expressions, constants 
and statements. The statements make up control-flow graphs (CFGs), which may 
be forward, backward, mutable or immutable. The CFGs are representations for 
the bodies of SootMethods. SootMethods constitute SootClasses, the backbone 
of SOOTUP’s core object model. All of these objects are accessible through Views. 


1 https: //doi.org/10.5281 /zenodo. 10037587 
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Fig. 1. Overview of SooTUP's Architecture. White boxes are Java modules. 


We have conceptualized the View as the main interface the user interacts 
with. In the case of a single view, this corresponds to the Scene object in SOOT. 
Because of the Scene's singleton nature, running multiple analyses simultane- 
ously was virtually impossible in SOOT [16]. SOOTUP overcomes this drawback 
by allowing as many Views as desired to co-exist. 

Additionally, SooTUP comes with a new extensible Call Graph framework. 
It allows plugging in arbitrary strategies for resolving virtual method dispatches. 
These strategies could vary, for instance, to optimize the precision or scalability, 
which are often tweaked using different Pointer Analysis algorithms. Interproce- 
dural Dataflow Analysis is one of the most successful methods for detecting bugs 
and security vulnerabilities. SOOTUP supports out-of-the-box context-sensitive 
data-flow analysis using the popular HEROS dataflow analysis framework. 


2.2 On-Demand Class Loading 


While SooT loads all SootClasses that are referenced in a currently resolving 
SootClass, SOOTUP is designed with a layer of indirection. SOOTUP makes use 
of identifiers to reference actual, possibly already loaded, instances of a respec- 
tive SootClass and stores those identifiers that reference other SootClasses, 
SootMethods or SootFields. This decreases unnecessary computations of un- 
used SootClasses, i.e. those which are referenced but whose contents are not of 
interest. Doing so, additionally, enables parallel class loading. Because the load- 
ing of a class does not depend on the loading of the classes that it references, 
each class can be loaded independently. As a side effect, it renders the concept 
of phantom classes, known from Soot, obsolete, as its purpose is to create a fa- 
cade SootClass in case of missing a class definition of a referenced SootClass. 
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Fig. 2. SooTUP's On-Demand Class Loading Mechanism 


'This case is now cleanly handled by the View, which simply returns no further 
information. 

Figure |2] models SooTUP’s new on-demand class loading mechanism. The 
View is the central access point that streamlines the resolving and caching pro- 
cess. The caching strategy can be configured by using one of the cache providers. 
FullCache is the default option, which suffices in most cases where the cache 
does not need to be freed. Alternatively LRUCache manages the cache based on 
the least recent use and MutableFullCache gives the control of the cache to 
the client. After obtaining a SootClass, by querying it with its unique identifier 
(ClassType) from the View, one can obtain its SootMethods and SootFields 
that are cached within the SootClass. 


2.3 Focus on an Intuitive API 


SooT's users often complain about a lack of documentation. Its issue tracker is 
filled with "how to”f] questions. We believe the underlying problem is, primarily, 
its complicated API design. Based on our past experience, when developing 
SOOTUP, an intuitive API design has always been strongly in focus. 

Figure |3| shows the process of setting up a Project, creating a View and 
accessing a SootMethod object. First, users create an AnalysisInputLocation 
that points to a target program's path. Second, they create a Project by spec- 
ifying the target language. The Project can be used to create a View. At this 
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Fig. 3. SooTUP's API for Creating a View and Accessing a SootMethod 


point, the View knows where the target program is located and which language 
frontend needs to be used to load its classes. 

The View loads the elements of the target program only when they are 
queried, and memoizes them through configurable caching providers enabled 
by the new immutable IR. design. The memoization is fine-grained, it works 
at the level of field, method, interface and modifier definitions. SOOTUP can 
create references to all of these objects via a corresponding language-specific 
IdentifierFactory. The references, i.e., the identifiers, are then used to access 
the queried elements of the target program. 

Class types and signatures (for methods and fields) are considered global 
identifiers, across possibly concurrent instances of Projects and Views. They 
are created and pooled by the singleton instance of IdentifierFactory to re- 
duce memory consumption. Additionally, it is cheaper to invoke hashCode() and 
equals() on the identifiers than on the IR objects that the identifiers reference. 


2.4 Library by Default 


Soot had always been designed to be a standalone CLI (command-line interface) 
tool. This meant that it was expected to own the thread of control, which often 
hindered a tight integration of SOOT into integrated development environments 
(IDEs) or CI/CD pipelines, which are themselves frameworks and expect to 
own the thread of control as well. Also, a CLI aggregates all of the underlying 
functionality and makes it accessible via a single channel. This requires bundling 
everything together and contradicts our goals of providing lean modules. 
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To avoid this, we have conceptualized SOOTUP as a library by default. In 
SooTUP, clients can depend on individual modules. For instance, to access the 
CFGs of a compiled program's methods, one needs to add a dependency to the 
Java Bytecode Frontend and Core modules. Further module dependencies can 
gradually be added later on when needed. 

The library nature allows the clients to own the thread of control. This is 
preferable, especially, when using SOOTUP for other purposes than program 
analysis, or when using it as part of other analysis frameworks. SOOTUP also 
provides rather sophisticated functionality as a framework, with inversion of 
control, for instance when building call graphs or performing dataflow analyses. 

Yet, SOOTUP is not quite stateless. As shown in Figure B] the state is man- 
aged mainly by the IdentifierFactory and View. View instances keep ref- 
erences to all the memoized objects, they are not garbage collected unless the 
client releases the reference to the View. IdentifierFactory, on the other hand, 
maintains the global state of unique identifiers statically. It is the only singleton 
in SOOTUP, which might be shared across different views. In other words, if the 
client terminates then only the state in the IdentifierFactory will be retained. 


2.5 Immutable IR by Design 


SOoT was designed as a program optimization tool. Its main purpose was to 
enable the analysis and transformation of method bodies. As the research trend 
has shifted from program optimization to program analysis, we believe there is 
limited use in still maintaining mutable objects in a mutable IR. 

Mutable objects are not easily shared between several entities. One needs 
to constantly account for unintended changes. They very much complicate par- 
allelization at any level. To counter this problem, we have designed SOOTUP’s 
JIMPLE IR to be immutable by default. This assures that there are no accidental 
modifications and that values can be safely shared and cached. 


1 |class Body { 

2 

3 

4 Body withStmts(List<Stmt> stmts) 1 
5 return new Body(stmts); 

6 } 

7 |} 


Listing 1.1. Modifying a Method Body via Withers 


To ensure immutability we have slightly adjusted the API as well. Many 
classes do not have setters anymore, they have withers instead. Withers still 
allow modifications via new object copies with modified properties. Listing |1.1 
for instance, shows how one can still modify the statements of a method body. 


2.6 Changes to Jimple 


Originally, JIMPLE was designed to be an IR for program optimization to fit 
SooT's primary use case. Since the purpose of SOOTUP has been shifted to- 
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wards program analysis instead of optimization, we adjusted the JIMPLE IR to- 
wards this purpose. For efficiency reasons, a Java compiler compiles any switch 
statement to either a tableswitch or a lookupswitch bytecode instruction. 
Since the distinction is needed to transform the optimized JIMPLE back to byte- 
code, JIMPLE also made a distinction between tableswitch and lookupswitch 
statements. However, virtually all program analyses will treat both kinds of 
statements identically. Because of this, in SooTUP both statements have been 
merged into a single switch statement, simplifying analysis implementations. 

Another novelty in SooTUP's JIMPLE is the added support for language 
extensibility. SOOTUP is designed to be an analysis framework that not just 
supports Java, but also other programming languages as well. To allow for this 
multi-language support, a basic JIMPLE IR has been implemented in a generic 
way that allows for easy extension with language-specific features. For the Java 
implementation, we extended this basic JIMPLE IR with import statements and 
annotations, two features that are highly specific to the Java language. Anno- 
tations are supported by extending JIMPLE's class type definition. Just like in 
Java source code, import statements improve the readability of Java-JIMPLE 
statements. Java- JIMPLE now allows referring to simple class names by defining 
their fully qualified names as imports. Likewise, basic JIMPLE can be extended 
to support features specific to other languages, e.g. JavaScript or Python. 


3 Demonstration 


In Section [2.3] we provided a glimpse of the new API. In this section, we demon- 
strate the new API with a set of most common use cases. 


3.1 Setup 


The code snippet in Listing [1.2] shows the starting point in SOOTUP to build 
an analysis project. The project builder requires two inputs: (1) the language of 
code to be analyzed and its version, as SOOTUP supports multiple languages; (2) 
the location of the analysis target. In this example, we are setting the analysis 
language as Java with version 8 and adding a Java classpath analysis input loca- 
tion that points to the analysis target. Note that one can add multiple analysis 
input locations to the project builder. The Java bytecode frontend accepts any 
of the Java archive formats (JAR or WAR), Android packages (APK), ZIPs or 
individual .class files. The Java source and the JIMPLE frontends accept .java 
and .jimple files respectively. To resolve a given class, the view will inspect all 
of the given analysis input locations. 


JavaLanguage language - new JavaLanguage(8); 

JavaProject project = JavaProject.builder (language) 
.addInputLocation(new JavaClassPathAnalysisInputLocation("/path")) 
.buildO; 

JavaView view - project.createView(); 


CO gum WN FE 


Listing 1.2. The creation of a view in SOOTUP 
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3.2 Obtaining a Method Body 


Assume the target code example in Listing Following the API usage in 
Section next we need to obtain a reference to the target class. To do so, 
as shown in Listing [1.4] we get the IdentifierFactory from the view at Line 
6. We obtain the target class type at Line 7 and likewise the target method’s 
signature at Line 8. A class is rather straightforward to identify, ie. with a 
string corresponding to its fully qualified name, e.g. "org.example.Main" in 
this example. 

Identifying methods requires a 
bit more information, as one needs package org. example; 
to specify its containing class type, public class Main { 
name, return type and parameter list 
to uniquely identify it. In this exam- 
ple, we use the target class type (ct) } B 
that we have created, set the name as } 
"run" and return type as "void". It 
is important to refer to any class type 
with its fully qualified name. For in- 
stance, while in Java it suffices to write String[] args to define the parameters 
as a string array, SOOTUP needs the definition as java.lang.String []. 


void run(String[] args) { 


Listing 1.3. Target code example 


IdentifierFactory factory = view.getIdentifierFactory() ; 
ClassType ct = factory.getClassType("org.example.Main") ; 
MethodSignature mSig = factory.getMethodSignature( 

ct, "run", "void", Collections.singletonList("java.lang.String[]")); 


«eo -1oO0 


Listing 1.4. Definition of a class type and a method signature using SOOTUP 


The method signature that we created (mSig) can now be used to query the 
actual method object from the view. This is shown at Line 10 in Listing [1.5] As 
the new API follows the modern Java best practices, view.getMethod() returns 
an optional, at Line 11, we therefore test this optional for its presence and obtain 
the methods body. At Line 12, we output all the statements of the method. 


10 | view. getMethod(mSig) 
11 .ifPresent(method -»method.getBody () 
12 .getStmts().forEach(System.out::println)); 


Listing 1.5. Output all statements in a method body using SooTUP 


3.3 Call Graph Generation 


A call graph models the calls between the methods of a target program, which 
makes it an essential data structure when performing interprocedural program 
analyses. SOOTUP's new call graph framework is based on a generic notion of 
a CallGraphAlgorithm, which can be extended by specific call graph algorithm 
implementations. The call graph algorithms only need to specify how they resolve 
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a call. Resolving can be based on the static class hierarchy (e.g. CHA [7]. RTA 2) 
or based on sophisticated pointer analyses [17]. 


13 CallGraphAlgorithm cha = new ClassHierarchyAnalysisAlgorithm(view); 
14 CallGraph cg = cha.initialize(Collections.singletonList (mSig)) ; 

15 cg.containsMethod (anotherMethod) 

16 cg.callsFrom(mSig) 


Listing 1.6. Call graph generation using SOOTUP 


Listing shows an example of call graph generation using the new API. 
Since the view maintains all the classes and methods, it needs to be passed to 
the call graph algorithm, e.g. the ClassHierarchyAnalysisAlgorithm at Line 
13. The call graph algorithm is initialized at Line 14, by specifying the entry 
method, which returns a CallGraph object. The call graph can be queried for 
method reachability, e.g. at Line 15, or can be iterated by retrieving the calls 
from the entry method, e.g. at Line 16. 


3.4 Body Interceptors 


Body interceptors in SOOTUP replace the concept of transformers in SOOT. 
They essentially allow modifying method bodies, for instance, to add, remove 
or replace statements. As with the other objects, methods are immutable by 
default. Therefore, in SOOTUP any modifications to the method body must be 
performed during the body-building phase. 


ClassLoadingOptions clo = new ClassLoadingOptions() { 
@Override 
public List < BodyInterceptor > getBodyInterceptors() { 
return Collections.singletonList (mew DeadAssignmentEliminator()); 
} 
}; 


JavaView view = project.createView(analysisInputLocation -> clo); 


NOoRWNEFR 


Listing 1.7. Specifying Body Interceptors 


Listing[1.7]|shows an example of specifying a body interceptor. In this example 
the DeadAssignmentEliminator is specified. The body interceptors must be 
defined as part of the class loading options, as they are applied during class 
loading. The options are passed during the view creation. 


4 "Tool Support 


Soot-based tools can be upgraded to use SOOTUP instead, however, depending 
on their implementation, the upgrading effort may vary. We next present the 
tools that SOOTUP currently supports and provides as submodules. We also 
suggest the roadmap for SOOT-based tools for switching to SOOTUP. 


BR 
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4.1 Heros 


HEROS enables defining interprocedural dataflow analysis using the IFDS 
(interprocedural, finite, distributive subset) and IDE (inter-procedural dis- 
tributive environments) conceptual frameworks. Both frameworks reduce 
dataflow analysis problems to graph reachability. While IDE well suits the anal- 
ysis problems with large domains (such as typestate or constant propagation 
analysis), IFDS is the primary choice for reachability analyses with a small do- 
main (e.g. taint analysis). 


JimpleBasedInterproceduralCFG icfg = 
new JimpleBasedInterproceduralCFG(view, entryMethod) ; 


IFDSTaintAnalysisProblem problem = 
new IFDSTaintAnalysisProblem(icfg, entryMethod) ; 


JimpleIFDSSolver<?, InterproceduralCFG<Stmt, SootMethod>> solver = 
new JimpleIFDSSolver (problem) ; 


SCOAN MOAB wWNH 


solver.solve(); 


Listing 1.8. IFDS analysis using HEROS 


SOOTUP provides the HEROS framework within its analysis submodule. List- 
ing [I.8| shows an example on running an IFDS analysis using HEROS. SoOoTUP 
implements HEROS' InterproceduralCFG interface with the JIMPLE-specific 
JimpleBasedInterproceduralCFG. To instantiate it, the client needs to pass 
the view and an entry method as shown at line 1. HEROS defines IFDS problems 
as an abstract class with DefaultIFDSTabulationProblem, this is extended by 
DefaultJimpleIFDSTabulationProblem in SOOTUP. However, the clients still 
need to define their custom IFDS analyses with problem-specific lattices, flow- 
functions and merge operators. An example of a basic IFDS-based taint analysis 
problem is available in SOOTUP, which is instantiated at line 4. SOOTUP ex- 
tends HEROS' generic IFDSSolver with the JimpleIFDSSolver by concretizing 
it with Stmt (equivalent to Unit in SooT) and SootMethod. 


4.2 Qilin 


Pointer information is an integral part of precise program analyses. SOOT’s 
pointer analysis frameworks, SPARK and its context-sensitive alternative 
PADDLE [18], have been popular in academia, as they provide a solid ground 
for researching novel algorithms. As we observe, however, the research trend is 
moving towards more sophisticated approaches with increased pointer analysis 
precision. For instance, context-sensitivity can be applied selectively rather than 
uniformly across the whole program [19]. 

QILIN is a state-of-the-art flow-insensitive pointer analysis framework 
that was recently designed for supporting fine-grained selective context sensi- 
tivity while subsuming existing traditional method-level context sensitivity as 
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a special case. Since QILIN is fully written in Java and operates on the JIM- 
PLE IR of SOOT, we were able to seamlessly incorporate QILIN into SOOTUP 
as a submodule with only minor engineering efforts. QILIN supports a rich 
set of pointer analyses such as Andersen’s context-insensitive analysis as im- 
plemented in SPARK |17|, k-limiting callsite-sensitive analysis 27], k-limiting 
object-sensitive analysis 22]p8], and other recent advancements in pointer anal- 
ysis. By providing QILIN as a SOOTUP submodule, we aim to foster comparative 
research using a broader set of pointer analysis algorithms. 


PTAPattern ptaPattern = new PTAPattern("20o"); 

Collection entries - Collections.singleton(mainSig); 

PTA pta = PTAFactory.createPTA(ptaPattern, view, entries); 
pta.run(); 

CallGraph cg = pta.getCallGraph() ; 
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Listing 1.9. Call graph generation using a pointer analysis in QILIN 


Listing[1.9]gives an example of 2-object sensitive pointer analysis using QILIN. 
In lines 1 and 2 the flavor of pointer analysis is specified and the entry method 
is set. In line 3 an instance of 2-object sensitive analysis is created which is 
subsequently executed in line 4. As the pointer analysis in QILIN supports on- 
the-fly call graph construction, the resulting call graph is retrieved in line 5. 
In addition, the pointer analysis API in QILIN provides reachingObjects(), 
for computing the points-to set of any variable and mayAlias(), for checking 
whether two variables are aliases. Note that QILIN is not part of SOOTUP’s 
current release. 


4.3 Roadmap for Other Soot-based Tools 


SOOTUP is not a drop-in replacement for SOOT. It is essentially a complete 
rewrite with a new architecture and API. We therefore primarily recommend 
SOOTUP to be used for new projects. However, existing tools that are based on 
SOOT can be upgraded to SOOTUP with some effort. The SooTUP team has 
been working on upgrading some SOOT-based tools to SOOTUP. So far, we see 
that the roadmap, and thus the effort, for a specific tool to upgrade to SooTUP 
will differ heavily based on how it is implemented. We have been seeing three 
recurring patterns: (1) generic tools that do not directly depend on Soor, (2) 
tools that depend on SooT but work with their own domain objects, (3) tools 
that depend on SOOT and work directly with SOOT objects. 

Generic tools can swiftly be upgraded to SooTUP. For instance, the API 
of the HEROS solver provides interfaces based on Java generics. Its interfaces 
can be extended with concrete tool-specific objects. The only requirement for 
SOOTUP to use the IFDS solver was to extend necessary interfaces by providing 
SooTUP-specific objects. 

Upgrading tools that use their own domain objects to SOOTUP is also simple. 
For instance, BOOMERANG and SPARSEBOOMERANG |15|, state-of-the-art 
demand-driven pointer analysis frameworks, implement their core functionality 


SootUp: A Redesign of the Soot Static Analysis Framework 241 


within their own domain objects that correspond to classes, methods and state- 
ments. These tools require SOOTUP’s objects to be converted to their domain 
objects via implementing an adapter. 

Upgrading tools that work directly with SOOT objects is a more complex task. 
FLOWDROID [1]. a popular Android information flow analysis tool, is highly in- 
tertwined with Soot. It is hard to determine where exactly the boundaries of 
FLOWDROD are and how to separate it from SOOT. Therefore, at this point, we 
anticipate that FLOWDROID and tools of similar nature need a major rewrite 
to upgrade to SOOTUP. Nonetheless, we are considering upgrading even FLOW- 
DROID to SooTUP in the future. 


5 Development 


We next explain SOOTUP’s development process, and how one can extend or 
contribute to SOOTUP. 


5.1 SootUp's Development Process 


We have incepted SOOTUP as a greenfield project. This choice not only granted 
us more freedom to restructure its architecture but also to employ a more mod- 
ern software development process. Our new development process centers around 
continuous quality assurance. SOOT lacked proper test coverage, which compli- 
cated adding new features or any kind of nontrivial refactoring. To overcome this, 
we made testing an integral part of SOOTUP from the very beginning. SooTUP 
is loaded with exhaustive unit and regression tests. We continuously observe 
its test coverage and enforce newly added code to maintain the same level of 
coverage. SOOTUP’s tests currently account for 63.70% line coveragq'] (9656 out 
of 15159 lines). To ensure that no new feature breaks or unintendedly changes 
SooTUP’s behavior, tests are executed for every new commit to SOOTUP’s code 
repository through a continuous integration pipeline. 

We seek to make SOOTUP more accessible to everyone. Our focus on an intu- 
itive API design, as we explained in Section [2.3] is the first step in this direction. 
Further, we prioritize documentation and make it part of the development pro- 
cess. Our public-facing API elements are required to have Javadoc. Yet, we have 
learned, considering the questions in SOOT’s issue tracker, that Javadoc alone is 
not enough. We thus maintain a documentation pag??| to elaborate on some of 
the main concepts of SOOTUP’s usage and provide more insight. To make the 
documentation beginner-friendly, we demonstrate the most common use cases 
with supporting code examples. From experience, we know that documentation 
tends to fall behind the most recent development state. To prevent this, we 
maintain the example code as part of SOOTUP’s code repository. By doing so 
we ensure that the example code always compiles and functions with the most 
recent state. 


^ https: //app.codecov.io/gh/soot-oss/SootUp 
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SOOTUP is currently published at Maven Central. We have announced the 
first release (v1.0.0) in December 2022. Since then, we have been frequently 
releasing new features and bug fixes, the most recent version (v1.1.2) was pub- 
lished in June 2023. While, due to existing tool dependencies, SooT and SooTUP 
will coexist for a while, the bulk of our maintenance efforts will henceforth be 
directed toward SOOTUP rather than Soor. 


5.2 Extending and Contributing to SootUp 


Concerning community engagement, SOOTUP will follow in the footsteps of 
Soot. While SooTUP's development is currently still carried by Paderborn Uni- 
versity, we are open for others to join the team. The main motivation behind our 
development efforts until the first release was to realize the design decisions laid 
out in Section 2] Since the first release, we have been focusing more on commu- 
nity feedback, such as bug reports and feature requests. Just like its predecessor, 
we expect SOOTUP to be shaped around the needs and contributions of the re- 
search community. We are eager to incorporate external contributions and very 
much welcome feature and pull requests. Repeat contributors may become core 
development team members with full commit rights. 

To maintain an active community, we set up a discussion board on GitHub. 
This allows the community to participate in Q&As, suggest new ideas or simply 
discuss in an informal setting. SOOTUP is open-sourced with a GNU General 
Lesser Public License v2.1 (LGPL-2.1) [11]. It allows SOOTUP to be modified as 
long as the modifications are stated and licensed under the same license. 


6 Future Work 


SOOTUP is set to be the successor of the old SooT framework. SooT has been 
developed and improved for more than 20 years, so there are still multiple analy- 
sis utilities that need to be adapted to SOOTUP. Furthermore, we aim to keep up 
with advancements in the field of static program analysis and implement support 
for better callgraph construction approaches and more precise pointer-analysis 
techniques in SOOTUP as they are developed. 

Being able to analyze Android applications was one of the main reasons for 
Soor's popularity. SooTUP currently allows one to analyze Android applica- 
tions with the help of dex2jar [| 'This is an interim solution, as dex2jar is no 
longer actively maintained. In the meantime, we are working on a more robust 
solution based on Dexpler [3]. 

SOOTUP was designed with extensibility for other programming languages 
in mind. To allow for cross-boundary program analyses, we aim to implement 
new frontends for other languages. We especially aim at implementing a Python 
and a JavaScript frontend, due to the popularity of these languages. SOOTUP 's 
IR can be extended to cover at least other languages that, unlike C/C++, do 
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not allow direct pointer accesses. However, language-specific challenges are not 
out of the scope of this paper and need to be further investigated in the future. 

Another goal for SOoTUP is to provide a means to enable the analysis of 
partial programs. To process an uncompiled Java source code project using SOOT 
or SOOTUP, the whole code base of the project, alongside all its dependencies, 
needs to be available either during compilation or during processing with the 
source code frontend. However, in some scenarios only part of the code base is 
available. In the future, we aim to provide support for processing such partial 
programs. By being able to generate Jimple from only partially available source 
code and substituting the missing information with either data that can be 
inferred from whatever is available of the code base or providing a means to 
additionally specify missing parts. 

Performance comparison to SOOT or other tools was not possible because 
one would have to compare two identical analyses within these frameworks. 
Such analyses are still lacking at the moment. We, nevertheless, compared to 
SOOT on the unit test level. By design, SOOTUP shows significant performance 
improvements, particularly in class loading. The immutable IR was also designed 
to support much faster analyses than what is currently possible with SOOT’s old 
JIMPLE IR. In the future, as SooTUP-based analyses mature, we will conduct 
detailed performance evaluations. 

In the future, we plan to also perform more evaluations regarding SooTUP's 
usability. An API design that is as intuitive as possible for its users was one of the 
primary considerations when designing SOOTUP. To validate the API design, we 
plan to perform user studies with various types of user groups like researchers and 
software developers. Furthermore, we plan to benchmark SOOTUP’s performance 
and compare it against other analysis frameworks and especially its predecessor. 


7 Related Work 


Apart from SOOT, there are various research-oriented static analysis frameworks. 
The most notable ones for Java are WALA [22], Doop [5| and OPAL [9]. WALA 
enables analyzing multiple programming languages such as Java, Javascript, and 
recently also Python [8]. It focuses on efficient static analysis by using specialized 
data structures. WALA’s IR is close to JVM bytecode, but in contrast, it is based 
on SSA (static single assignment). Instead of operand stacks, it uses symbolic 
registers. SOOTUP is currently integrated with WALA’s source code frontend, 
which enables SooTUP to support source code in the same capacity as WALA 
does. Doop was originally developed as a pointer analysis framework. It enables 
defining static analyses declaratively and uses a Datalog solver. Doop's IR is also 
based on JIMPLE. It could probably be upgraded to SooTUP with minor effort. 
OPAL provides highly configurable static analysis using abstract interpretation. 
PhASAR is another notable static analysis framework that enables static 
analysis for C and C++ applications through the LLVM IR. LiSA static 
analysis library enables novice users to implement static analyses that can target 
arbitrary languages based on the IMP programming language. 
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8 Conclusion 


We have presented SOOTUP, a complete overhaul of the popular SOOT opti- 
mization and analysis framework for Java. SOOTUP shifts the purpose from 
optimization to static code analysis and fully modernizes the original SOOT im- 
plementation. SOOTUP implements all the lessons learned from the last 20+ 
years of development and usage of the original SooT framework. It comprises 
many improvements like a new user-centric API, a fully parallelizable archi- 
tecture and an new variant of the Jimple intermediate representation offering 
extensibility for multi-language support. With all these changes and improve- 
ments in place, SOOTUP aims to be a worthy successor of the good old SOOT 
framework and to enable the implementation of modern Java code analyses. 
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Abstract. Distributed architectures are used to improve performance 
and reliability of various systems. Examples include drone swarms and 
load-balancing servers. An important capability of a distributed architec- 
ture is the ability to reach consensus among all its nodes. Several consen- 
sus algorithms have been proposed, and many of these algorithms come 
with intricate proofs of correctness, that are not mechanically checked. 
In the controls community, algorithms often achieve consensus asymp- 
totically, e.g., for problems such as the design of human control systems, 
or the analysis of natural systems like bird flocking. This is in contrast 
to exact consensus algorithm such as Paxos, which have received much 
more recent attention in the formal methods community. 

This paper presents the first formal proof of an asymptotic consensus al- 
gorithm, and addresses various challenges in its formalization. Using the 
Coq proof assistant, we verify the correctness of a widely used consen- 
sus algorithm in the distributed controls community, the Weighted-Mean 
Subsequence Reduced (W-MSR) algorithm. We formalize the necessary 
and sufficient conditions required to achieve resilient asymptotic con- 
sensus under the assumed attacker model. During the formalization, we 
clarify several imprecisions in the paper proof, including an imprecision 
on quantifiers in the main theorem. 


Keywords: Resilient asymptotic consensus - W-MSR algorithm - Net- 
work robustness. 


1 Introduction 


To enhance reliability, robustness and performance, many modern systems use a 
distributed architecture, composed of multiple nodes communicating with each 
other. Examples range from coordinated control of multi-robot systems such as 
swarms of mobile and aerial robots, to load-balancing among servers answering 
many queries per second. A fully decentralized system, where decisions are made 
collectively by the nodes rather than by one master node, greatly improves reli- 
ability by ensuring there is no single point of failure in the system. A distributed 
architecture also provides greater performance (depending on the context, in 
terms of load capacity, reduced latency, smaller communication overhead, etc.) 
(9 The Author(s) 2024 
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than any single node could ever achieve. Distributed architectures are supported 
by distributed algorithms, which particularly focus on carefully handling situa- 
tions where some nodes become faulty, stop responding, or become malicious. 


One central aspect of distributed algorithms is the ability to achieve consen- 
sus. Consensus is said to be achieved in a network if all normal (correct) nodes 
agree on a certain value, where a node is normal if it is not faulty [34]. The 
value agreed upon by all nodes can be a reference point for the next position 
of a swarm, or the sequence of commands executed by a set of replicas in State 
Machine Replication [44]. Consensus has been studied extensively in different 
communities. In the distributed computer systems communities, some promi- 
nent algorithms achieving consensus are Paxos [29], MultiPaxos [47], Raft [36], 
and Practical Byzantine Fault Tolerance (PBFT) [6]. However, these algorithms 
deal with the problem of exact consensus. There are many scenarios where exact 
consensus is not achievable, ranging from the design of human controlled sys- 
tems to analysis of natural systems like bird flocking. These problems have to be 
solved under harsh environmental restrictions such as restricted communication 
abilities and presence of communication uncertainty. Therefore, these problems 
warrant the study of asymptotic consensus problems, which unlike exact con- 
sensus, do not require strong assumptions on the underlying network [16]. 


This paper presents the first formal proof of an asymptotic consensus al- 
gorithm, by formalizing the Weighted-Mean Subsequence Reduced (W-MSR) 
algorithm [30, 50]. The problem of asymptotic consensus is of much importance 
to the distributed robotics and controls community, who have studied algorithms 
like the Mean Subsequence Reduced (MSR) algorithm [27] and its recent exten- 
sion W-MSR. These algorithms are designed to achieve asymptotic consensus in 
partially connected groups of nodes, but have not been formally verified. Formal 
verification of consensus algorithms is important as has been emphasized by the 
distributed computer systems community, who have long invested in producing 
mechanically checked proofs of its consensus protocols. The controls commu- 
nity, however, lags behind in this direction. In recent years, the distributed sys- 
tems community has embraced formal methods to provide mechanically-checked 
proofs of its consensus protocols and their implementations, using a wide range 
of techniques from interactive and automated theorem proving [48, 25,8, 5, 18, 
9,31] to automatic generation of inductive invariants [33, 21, 49, 20]. In the dis- 
tributed robotics and controls community however, researchers usually prove 
their consensus protocols with paper proofs, using mathematical analysis based 
on Lyapunov theory and its extensions, without computer-checked formaliza- 
tions. As we show in this paper, our formalization of asymptotic consensus for 
the W-MSR algorithm [30] reveals imprecisions in the placement of quantifiers 
in the main theorem and several missing pieces in the proof, thereby highlight- 
ing the importance of machine-checked proofs. T'hus a significant contribution 
of our work is providing the first mechanically checked formalism of the asymp- 
totic consensus and its application to the W-MSR algorithm, widely used in the 
controls community. We have chosen to formalize this algorithm since it is a 
widely-used algorithm for resilient consensus [42, 41, 46]. From the perspective 
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of practical applications, enabling resilient consensus in the presence of misbe- 
having or faulty nodes is desirable for many applications in autonomous systems 
and robotics, e.g., for coordinated control of multi-robot systems. 

The MSR and W-MSR algorithms are very different from exact consensus 
algorithms such as MultiPaxos, Raft or PBFT. As such our formal verification 
of the correctness of W-MSR uses different techniques than previous proofs of 
exact consensus algorithms. The first major difference is that MSR and W-MSR 
guarantee asymptotic consensus rather than finite-time consensus. A second ma- 
jor difference is that MSR and W-MSR provide consensus in networks that are 
not fully connected: two normal nodes might not be able to communicate with 
each other directly, but might have to rely on another (possibly faulty) node to 
forward their messages to each other. This last property is crucial to model multi- 
robot systems where complete communication between any two robots may not 
be feasible at all times. Because of those differences, providing a mechanically- 
checked proof of W-MSR requires the development and use of different tech- 
niques than the ones typically used to mechanically check Multipaxos, Raft or 
PBFT. In particular, our formalization crucially relies on formalization of limits 
and real analysis, because many of the techniques used in model-checking or for 
generating invariants are not well-suited to prove asymptotic properties. 

Contributions: 'The original contribution of this work is the formalization in 
the Coq theorem prover of the convergence results of the W-MSR algorithm [30]. 
Specifically, we provide a machine-checked concrete counterexample for the proof 
of necessity, a clean proof of Lemma 1 and the Coq formalization of the main the- 
orem (Theorem 1). We also fill in several missing details and clarify imprecisions 
in the proof of sufficiency, which can be viewed as an addition to the existing 
proof [30]. Additionally, this is, to our knowledge, the first mechanical formal- 
ization of a consensus algorithm where the consensus is obtained asymptotically, 
opening the door to more such proofs. 

'This paper is organized as follows. In Section 2, we discuss the problem 
setup and define terminologies related to graph topology and the W-MSR al- 
gorithm [30]. In Section 3, we discuss the formalization of the necessary and 
sufficient conditions in Coq, for achieving resilient asymptotic consensus. We 
also discuss some specific challenges we encountered during the formalization. 
After reviewing some related work in Section 4, we conclude in Section 5 by 
discussing key takeaways from our work and generic challenges we encountered 
during the formalization. We also lay down a few directions that could be ad- 
dressed in future work. 


2 Preliminaries 


In this paper we consider the problem of formalizing consensus in a network, 
and adopt the problem formulation from [30]. While the original paper discusses 
consensus in a distributed control graph for both malicious and byzantine threat 
models for both time-varying and time-invariant graph structures, we limit our 
formalization to the case of a time-invariant graph for a malicious threat model 
and for a particular threat scope: F-total, where the total number of malicious 
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nodes in the control graph is bounded. We will next discuss briefly what each of 
these highlighted terms means in the context of the following problem. 


2.1 Problem formulation 


Consider a network that is modeled by a digraph (directed graph), D = (V, €), 
where V — (1,...,n] is the node set and £ C V x V is the directed edge set. 
The node set is partitioned into a set of normal nodes N, and a set of adversary 
nodes A, which are unknown a priori to the normal nodes. Each directed edge 
(j,i) € € models information flow and indicates that node i can be influenced 
by (or receive information from) node j at time-step t. The set of in-neighbors 
of node i is defined as V; = (j € V|(j, i) € E}. Intuitively, the set of in-neighbors 
contains all neighboring nodes of i, such that the direction of information flow 
is from those nodes to 7. The cardinality of the set of in-neighbors is called the 
in-degree, d; — |V;|. Since each node has access to its own value at time-step t, 
we also consider a set of inclusive neighbors of node i, denoted by J; = Vi U {i}. 


2.2 "Threat Model 


As discussed earlier, we formalize a threat model (F-total malicious model [30]) 
in which every adversary node in the graph is malicious, and there exists an 
upper bound F on the number of malicious agents in the graph, i.e., the set 
of adversary nodes are F-totally bounded. In the context of the problem in 
Section 2.1, some relevant formal definitions pertaining to the threat model are 
stated as: 


Definition 1 (Malicious node [30]). A node i € A is called Malicious if it 
sends the same value x;(t) to all its neighbors at each time step t, but applies a 
different update function fi(.) at some time step. 


Definition 2 (F-total set [30]). A set S C V is F-total if it contains at most 
F nodes in the network, i.e., |S| € F, F € Zso. 


Definition 3 (F-totally bounded [30]). A set of adversary nodes is F-totally 
bounded if it is an F-total set. 


Note that while Definitions 2 and 3 may appear similar, they define different 
terminologies. Definition 2 defines an F-total set with at most F nodes in a 
network. Definition 3 specializes this to a set of adversary nodes saying that 
there are at most F adversarial nodes in a network. 


2.3 Robust network topologies 


The ability of a set of normal nodes in a control graph to achieve consensus 
depends on its ability to make local decisions effectively. Le Blanc et al. [30] 
defined a topological property called network robustness for reasoning about the 
effectiveness of purely local algorithms to succeed, which we formalize in Coq. 
In particular, they define a property called (r, s)-robustness, which is stated as: 
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Definition 4 ((r,s)-robustness [30]). : A digraph D = (V,£) om n nodes 
(n > 2) is (r, s)-robust, for nonnegative integers r € Z>0, 1 < s < n, if for 
every pair of nonempty, disjoint subsets Sı and S3 of V at least one of the 
following holds (i) |X$.| = |Sil; (ii) |X£,| = |Sel; (iii) |, | + |AS,] = s, where 
Xs, = {i € Sp : |Vi\Sk| > r} for k € {1,2}. 


The condition (iii) states that there are a total of at least s nodes from the union 
of sets $1 and $5, such that each of those nodes have at least r nodes outside 
of their respective sets in the union $1 U $5. The idea is that “enough” nodes in 
every pair of nonempty, disjoint sets S1, S2 C Y have at least r neighbors outside 
of their respective sets. T'his ensures that the network is well connected, and that 
loss of information from a node due to malicious attack does not affect the whole 
network. Figure 1 illustrates an example of a network with (2,2) robustness. 


S1 US; 


5, Sp 
(a) (b) 


Fig. 1. Illustration for (2,2) robustness. In the illustration (a), every node of the set 
S2 has 2 neighboring nodes outside S2. Similarly every node in the set Sı has at least 
2 neighboring nodes outside S1. In the illustration (b), there are 2 nodes in the union 
Sı U S5 that have 2 neighbors outside the set. Note that the sets Sı and S» are disjoint. 


2.4 Update model for the normal nodes 


In this paper, we formalize a consensus algorithm, called the W—MSR algo- 
rithm [30]. This algorithm provides an update model for the normal nodes in 
the network. A schematic of the algorithm is illustrated in Figure 2. We denote 
the value emitted by node i at time t as z;(t), and the value of the directed 
weighted edge from node j, to node i at time t as w;;(t). The value z;(t) could 
represent a measurement like position, velocity, or it could be an optimization 
variable. The quantity aj (t) is the information that the j^" node in the neigh- 
boring set of node i sends to the node 7. Each node also has a varying set of 
neighbors which it ignores that we denote as 7€;(t). The set R;(t) changes be- 
cause the nodes are removed depending on their value with respect to the value 
of node i at time £. In this algorithm, the updated value of a normal node 2 
at time t+ 1 is the convex sum of the values of its neighboring set including 
itself. Hence, z;(t-- 1) = J je JAR) Wis (t)25(t), where we assume the existence 
of a constant a € R, such that 0 < a < 1, and the weights w;;(f) satisfy the 
conditions: 
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1. wi(t) = 0 whenever j ¢ Ji; 
2. wi;(t) > o, Vj € Ji; and 
3. jE T\Ri lt) wi; (t) =1 
for all i € M, and t € Zzo. It is important to note that the third condition 
depends on the set of removed nodes, which may change over time. In order to 
satisfy this condition the values of the weights may need to change over time. 
The choice of neighboring sets in the W—MSR algorithm is defined as follows: 


1. At each time-step t, each normal node i obtains the values of its neighbors, 
and forms a sorted list 

2. If there are fewer than F nodes with values strictly greater than the value 
of i, then the normal node removes all those nodes. Otherwise, it removes 
precisely the largest F values in the sorted list. Likewise, if there are less 
than F nodes with values strictly less than the normal node 7, the normal 
node removes all such nodes. Otherwise, it removes precisely the smallest F 
nodes in the sorted list. 


Remove top F nodes 
with values greater than 
or equal to the value of 
node i. If there are less 
than F nodes with values 
greater than the value of 
node i, all of them are 
removed 


Neighboring 
nodes are 
sorted in 
ascending 


xt) order. 


. The order is 
xt) decided by 
their values 
w.r.t the value 
of node i 


Remove bottom F nodes 
with values less than or 
equal to the value of 
node i. If there are less 
than F nodes with values 
less than the value of 
node i, all of them are 
removed 


Fig. 2. Schematic of the W-MSR update. At time t, the node 7 obtains values from 
its neighbors and forms a sorted list. The algorithm then removes the largest and the 
smallest F nodes in the sorted list, or if there are less than F nodes with values strictly 
greater than or less than the value of i, the algorithm removes all those nodes. 


An important point to note here is that the above update model holds only for 
the normal nodes, i.e., 7 € M. The update function for adversary nodes, i.e. 
i € A, and their influence on the normal nodes depend on the threat model. We 
will next discuss the formalization of the W—MSR algorithm in Coq. 


3 A formal proof of consensus for the W-MSR algorithm 


Theorem 1. [30] Consider a time-invariant network modeled by a digraph 
D = (V,€) where each normal node updates its value according to the W-MSR 
algorithm with parameter F. Under the F-total malicious model, resilient asymp- 
totic consensus is achieved if and only if the network topology is (F +1, F +1)- 
robust. 
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The proof of this theorem requires us to prove both a sufficiency and a necessity 
condition. The original paper proof relies on a safety condition, which provides 
an invariant condition that must hold at all times in the state update. We will 
next discuss the proof of the safety condition (Section 3.1), then sufficiency 
(Section 3.2) and necessity (Section 3.3) conditions individually. 


3.1 Proof of the safety condition in W-MSR 


Lemma 1 (Safety condition). /30] Suppose each node updates its value ac- 
cording to the W-MSR algorithm with parameter F under the F-total malicious 
model. Then for each node i € N, xi(t +1) € [m(t), M(t)), regardless of the 
network topology. 


Here, m(t) = minjiey {xi(t)} and M(t) = maxiew (xi(t))]. Note that the 
original paper [30] does not provide a proof of this lemma, and our proof, which 
we formalize in this paper, is an original contribution. We provide a detailed 
proof of the lemma by explicitly enumerating the cases from the definition of 
the W-MSR algorithm. On the other hand, the original paper [30] merely states 
an outline, making a careful check of the proof difficult. 


Proof. We prove Lemma 1 by showing inductively, that at each time t, and for 
every normal node i, there exists a node jı € Jı ON such that Vk € Ji \ 
Ri(t), xj, (t) € zi (t), thus: 


ai(tt+1)= X wig(tmi(t) M. wyth t) = (t 2 m(t) (1) 


JETi\Ri(t) JETi\Ri(t) 


Symmetrically there exists a j2 € FANN such that Vk € J; VRj(t), vj, (t) > x(t). 
Thus, the symmetric inequality z;(t+1) < M(t), holds for the same reason. Since 
the proof of the existence of jı and jọ are nearly identical, we only show the 
proof of the former in Appendix A of the extended version [45]. 


Formalization in Coq: We formalize Lemma 1 in Coq as: 


Lemma lem 1: V (i:D) (t:nat) (mal:nat — D — R) (init:D — R) 
(A:D — bool) (wnat — Dx D >R), 

F total malicious mal init A w — 

wts well behaved A mal init w — 

i € Normal A — ((x mal init Aw (t--1) i < M mal init A w t) 
A (m mal init A wt < x mal init Aw (t+1) i)). 


The definition of F_total_malicious states that the model is F-total malicious 
if the set of adversary nodes are F-totally bounded (i.e., there are at most F 
adversary nodes in the network) and all the adversary nodes are malicious. Here 
A: D — bool is a tagging function. If A i == true, then i is classified as an 
Adversary node else it is classified as a Normal node. mal : nat — D — Risan 
arbitrary update function for a malicious node. Since we do not know beforehand, 
how this function would look like, we assume it as a parameter. The function 
init : D — Risan initial value associated with a node. We define a malicious 
node in Coq as that node in the graph for which the normal update model does 


not hold, i.e., there exists a time t such that z;(t-- 1) # J jezgara) Wiz t)r). 
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(** Condition for a node to have malicious behavior at a given time **) 
Definition malicious, at i t (mal:nat — D — R) (init:D — R) (A:D — bool) 
(wnat — D * D >R) (i:D) (t:nat): bool := 

(x mal init A w (t+1) i)!— J) jegre) ((x mal init Aw t j) * (wt (1,3))) 
(** Define maliciousness **) 

Definition malicious (mal:nat — D — R) (init:D — R) (A:D — bool) 

(wnat + D * D >R) (i:D) := d t:nat, malicious at i t mal init A w i t. 


The second hypothesis wts well behaved states that we respect those three 
conditions on weights that we discussed in Section 2.4. The assignment of weights 
depend on whether a node j € J;NVR4(t) or not. Here, J; denotes the inclusive 
set of neighbors of the node i. &;(t) denotes the removed set of nodes according 
to the W-MSR algorithm, and we define R;(t) in Coq as follows 
Definition remove extremes (i:D) (1:seq D) (x:D > R) : (seq D) := 

filter (fun (j:D) — 

(((Rge. dec (x j) (x i)) || (F € (index j 1))) && ( R1e. dec (x j) (x i) 

|| (index j 1 € ((size 1) — F — 1))))) 1. 

Note that we use the filter function from the MathComp sequence library. This is 
crucial as it gives us lemmas that allow us to assert that any node in J; V Ri(t) 
satisfies the conditions of the filter. Additionally, the filter function requires that 
its first argument has a pred type, D — bool in our case. Therefore, we need 
our inequality operations to be decidable. Hence, we used the decidable versions 
of the inequality operations, such as Rle_dec, provided by Coq's reals library 
instead of it’s built-in € operation. We then define the set 7; V R;(t) in Coq as 


Definition incl, neigh minus, extremes 
(i:D) (x:D — R) : (seq D) :— remove, extremes i (inclusive neighbor list i x) x. 


Since J;\Ri(t) is defined based on the value of node i, x;(t), which indeed 
depends on A, mal, init. Hence, wts well behaved depends on A, mal, init. 
'The trickiest parts of the proof of Lemma 1 rely on the fact that we desire 
Ji \ Ri(t) when treated as a list to be sorted. In order to fulfill this condition 
we use the formalization for sorting found in the MathComp library. To do this 
we first define a relation on D as: 
Definition sorted Dseq rel (x: D— R) (ij : D) :— 
if Rle dec (x i) (x j) then 
if (x i = = x j) then (index i (enum D) < index j (enum D)) else true 
else false. 


This definition ensures that if x;(t) < v;(t), then i is ordered as less than j with 
respect to this relationship. In the case of nodes with equivalent values we use an 
arbitrary mechanism to break ties. Doing so ensures that this relation is total, 
and satisfies transitivity, anti-symmetry, and reflexivity. This relation lets us use 
the sorting lemmas in MathComp's path library [13], and it ensures the weaker 
condition that we occasionally use in the proof: 

Definition sorted Dseq (x:D > R) (1:seq D) :— 

V (a b:D), acibcl- (index al < index b 1) > (x a < x b). 


256 M. Tekriwal et al. 


The biggest difficulty with formalizing this proof arises when dealing with the 
case that |RS(t)| < F, where Rz (t) := {j € Ji : v;(t) < a(t) and idz g,(x;(t)) < 
F}, and define idx ;(x;,(t)), to be the index of the value x;(t) in a given list | 
of values, or the size of l if z(t) is not present.. In particular, showing that 
idzzm,((j) 2 0 => nj(Ji) ^ | R7 (t)|. This requires proving an extra lemma 
on the J; list: 
Lemma partition incl: V (i:D) (t:nat) (mal:nat — D > R) 
(init:D — R) (A:D > bool) (w:nat — D x D >R), 
inclusive neighbor list i (x mal init A w t) = 
(sort ((sorted Dseq rel (x mal init A w t)) ) 

(enum (R. i less than mal init A w i t))) + + 
(incl .neigh minus, extremes i (x mal init A w t)) + + 
(sort ((sorted Dseq rel (x mal init A w t)) ) 

(enum (R. i greater than mal init A w i t))). 


With this lemma, we can reason that the zero-th index of J; V Ri(t), is the 
|R<(t)|-th index of J. Using this lemma, we can prove the existence of jı in 
the proof of lem. 1. Symmetrically, we can show the existence of jọ such that 
Vk € Ji NRi(t), £j (t) 2 x(t). Tying it all together, we complete the proof of 
the lemma lem_1 in Coq. 


3.2 Proof of Sufficiency 


Lemma 2. [30] Consider a time-invariant network modeled by a digraph D = 
(V,£) where each normal node updates its value according to the W-MSR al- 
gorithm with parameter F. Under the F-total malicious model, if a network is 
(F+1, F+1) robust, resilient asymptotic consensus is achieved. 


This is an important lemma because we would like to design a network such that 
the normal nodes in the network reach an asymptotic consensus in the presence 
of malicious nodes in the network. Next we will discuss an informal proof of the 
Lemma 2 followed by its formalization in the Coq proof assistant. 


Proof. The proof of Lemma 2 is done by contradiction. We start by assuming 
that the limits Am and Am of the functions M(t) and m(t) respectively are 
different, i.e., Ay A Am. The limits Am and Am of the functions M(t) and m(t), 
respectively, exist because M (t) and m(t) are both continuous and monotonously 
decreasing functions of t. Therefore, by definition of limits for M(t) and m(t), 
we know that V t, Ay < M(t) ^ m(t) € Am, as illustrated in Figure 3. We 
will show that by carefully constructing the sets $4 and S» in the definition of 
(r, s)-robustness, and unrolling the definition of (r, s)-robustness at every time- 
step inductively, we eventually arrive at the desired contradiction: d t, M(t) « 
Am V Am < m(t). We discuss the details of the proof in Appendix B of the 
extended version [45]. 


Formalization in Coq: We introduce the following axiom in Coq to support 
reasoning by contradiction. 
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Fig. 3. Illustration of the tube of convergence bounded above by Am + € and bounded 
below by Am — e. We observe the behavior of functions M(t) and m(t) inside this tube 
of convergence Vt > te. We prove that M(t) and m(t) are monotonous Vt > te, and 
they approach the limits Am and Am, respectively. We start by assuming that Am 4 
Am, but later prove that Am = Am by contradiction, thereby proving asymptotic 
consensus. 


Axiom proposition, degeneracy : VA: Prop, A = True V A = False. 


'This is a propositional completeness lemma that allows us to reason classically 
and is consistent with the formalization of classical facts in Coq's standard li- 
brary. We need this lemma because we prove the sufficiency condition using 
contradiction. We are choosing to use classical reasoning because the original 
paper [30] does not provide a constructive proof. The reasoning used in the 
paper is classical. This requires us to state the following lemma in Coq 


Lemma P. not, not. P: V (P:Prop), P € -(- P). 


The proof of P. not. not P uses the axiom proposition degeneracy. 
We state the sufficiency condition (Lemma 2) for the network to achieve resilient 
asymptotic consensus as the following in Coq. 


Lemma strong sufficiency: 

V (A:D — bool) (mal:nat — D > R) (init:D > R) (w: nat > Dx D >R), 
nonempty nontrivial graph — 

(0 < F+1 < [DAN > 

wts well behaved A mal init w — 

r s robustness (F + 1) (F + 1) > 

Resilient, asymptotic consensus A mal init w. 


The sufficiency condition requires that the graph is non-trivial, i.e., there are at 
least two nodes in the graph, and the number of faulty nodes F in the graph is 
bounded by the total number of nodes D. We define r_s_robustness in Coq as 


Definition r, s robustness (r s:nat):= 
nonempty nontrivial graph A ((1 < s < |D|) > 
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V (81 82: (set D}), 

(81 C Vertex ^ ([81| 20)) > 

(S2 C Vertex ^ ([82| 20)) > 

[disjoint S1 & S2] > 

(( ]Xi.S.r $1 r| = = [S1] ||((| Xi_S_r S2 r| = = |s2)) || 
(| Xi_S_r S1 r| + |Xi_S_r S2 r| > s)) )). 


where Xi_S_r S1 r is the set of all nodes in the set $1 such that all of its nodes 
have at least r neighboring nodes outside S1. In Coq, we define Xi_S_r as 


Definition Xi_S_r (8: {set D}) (r:nat):= 
[set i:D | i € S & (| (in neighbor i) — 8| > r)]. 


We define Resilient_asymptotic_consensus in Coq as 


Definition Resilient_asymptotic_consensus 
(A:D — bool) (mal:nat — D — R) (init:D — R) (wnat > D D > R):= 
(F_total_malicious mal init A w) — (3 L:Rbar, V (i:D), 
i € (Normal A) — is lim seq (fun t: nat > x mal init A w t i) L) A 
(V t:nat, (m mal init A w 0 X m mal init A w t) A 
(M mal init A w t X M mal init A vw 0)). 


Here, is lim seq is a predicate in Coquelicot that defines limits of sequences. 
Rbar is the extended set of reals, which includes --oo and —oo. To prove that the 
network achieves resilient asymptotic consensus under the (F +1, F4- 1)- robust- 
ness condition, we need to prove the following two conditions in the definition 
of Resilient asymptotic consensus: (i) Vt,m(0) < m(t) ^ M(t) < M(0), and 
(it) IL, Vi,i € N > jim, x(t) = L. We state the first subproof as the lemma 
statement interval bound in Coq. The proof of lemma interval bound is a 
consequence of Lemma 1. We prove this lemma by an induction on time ¢ and 
then apply Lemma 1 to complete the proof. 

We prove the second subproof by contradiction in Coq. To start the proof of 
contradiction, we need to assume that the limits Am and Am of the maximum 
and minimum functions M (t) and m(t) are different. We then instantiate the sets 
Sı and S» in the definition of (r, s)- robustness with A (te, €o) and A t.c, €o) 
respectively, where X (te) = {i € V : a(t) > Au — e] and Xm(t, e) = {i € 
Y : a(t) < Am +e}. In Coq, we define the sets Xm for any epsilon and t as 
follows 


Definition X m t e.i (e i: R) (A m:R) (t:nat) (mal : nat — D — R) (init : D > R) 
(A: D — bool) (w: nat > D x D > R) :— 
[set i:D | Rlt dec (x mal init A w t i) (A_m+ e. i)]. 


where R1t.dec is Coq's standard decidability lemma for less than operation. 

We need to prove that the sets Xm and ¥m are disjoint at all times till we 
reach a point when either Xm or Xm are empty. This requires us to prove the 
following lemma in Coq 


Lemma X. M. X m. disjoint at. j 
(mal: nat — D — R) (init: D > R) (A: D — bool) (w: nat > D x D >R): 
V (t. eps l:nat) (a A. M A m :R) (eps. 0 eps :posreal), 


Formally verified asymptotic consensus in robust networks 259 


(A.M — (eps. j 1 eps. 0 eps a) > A m + (eps. j 1 eps. 0 eps a)) > 
[disjoint (X M t. e. i (eps. j 1 eps. O eps a) A. M (t. eps--1) mal init Aw) & 


(X_m_t_e_i (eps_j l eps_O eps a) A. m (t eps--1) mal init Aw )]. 


Since X (t, +l, ej) is a set of all nodes with values at least, Aj; — e; and A (te + 
l, €i) is a set of all nodes with values at most Am + ej, these two sets are disjoint 
if Ay — & > Am +e. For l = 0, we have defined e, such that Ay — €o > 
Am + €o. To prove that Am — €; > Am + e, Vl, O < l, we need to show that 
Aw — € > Ay — €o and Am + €; > Am +e. This would indeed require us to 
show that e; < e5, Vl, 0 < l. This holds since we had defined e; recursively as 
€ :— ae- + (1 — ae. 

A crucial aspect of the sufficiency proof is proving that the (F + 1, F + 1)- 
robustness implies that there exists a node in the union of the set Xm NN and 
XmON such that it has at least F+1 nodes outside the set. This was particularly 
challenging because in the original paper [30], the authors do not use all three 
conditions in the definition of (F + 1, F + 1) robustness condition to informally 


prove the implication. They use only the third condition (F +1 € |Œ | + 


|æ a |) to state the implication, while leaving it up on the readers to connect 
the missing dots with the first two conditions. For the implication to hold, all 
three conditions in the definition of (F + 1, F + 1)- robustness should imply the 
existence of such a node since there is an or in the definition of (F 4- 1, F + 1)- 
robustness connecting the three conditions. To prove the implication from the 
first two conditions, we need to first prove the existence of a normal node in the 
sets Ay and ¥m for alll € N. This holds since the node i with value M(t, +1) 
will always be above the threshold Aj; — e; because M(t) > Ay, Vt due to the 
existence of the limit Am. Hence, 0 < |AXy (tc +l, e)|, Vl € N. Since the first 


condition of (F +1, F+1)- robustness states that bomen 4e) 1 = Xu (te +l, a)|, 
fei |. Hence by definition of AETI there exists a normal node 


Xm (tele) M (te lei) ? 
in the set X y (t. E €) such that it has at least F+1 nodes outside X m (te+l, ej). 


We prove this formally in Coq using the following lemma statement 


Lemma X_m_normal_exists_at_j (t eps 1 N: nat) (a A_m: R)(eps_0 eps:posreal) 

(mal: nat + D — R) (init : D — R) (A: D > bool) (w: nat > Dx D >R): 

F total malicious mal init Aw — 

wts well behaved A mal init w — 

(0 <F+1< |D|) > 

is lim seq [eta m mal init Aw] A m > 

(0 < N) > (1 < N) > (0 <a < 1) => (eps < a" / (1—a") «eps. 0) > 

J i:D, i € (X_m_t_e_i (eps_j 1 eps_0 eps a) A m (t eps + 1) mal init A w) ^ 
i € Normal A. 


By symmetry, we prove that 0 < |v), Tal The other part that was not 


(te+1,€1 |. 
explicit from the paper proof in the T m [30] was that the largest 
value that the node i uses at time step t. + l is M(t. +1), which is provided 
without proof. This was a challenge during our formalization. To formally prove 
this we had to split the neighbor set of i into two parts depending on their 


relative position with respect to i. While it is easy to bound the values of the 
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nodes positioned in the left side of i with M(t. + L) since the neighboring list is 
assumed to be sorted at the time of update and we have established this upper 
bound for any normal node from lemma 1, bounding the values for the nodes 
positioned in the right of the normal node 7 was not trivial. We proved this using 
a case analysis on the cardinality of the set R7 (t). In Coq, we formally prove 
this using the lemma statement x right ineq 1 in Coq. We do not expand on 
this lemma here for brevity. 

Another challenge during the formalization was using the bound of the neigh- 
boring node of i, Am — e; in the update of the value of i at the next time step. 
We know that the neighbors outside the set Ji(te + D NX (te + l, €1) have value 
at most Am — ej. But to use these nodes in the update function, we need to show 
that these neighboring nodes are in the inclusive set of the normal node 7 minus 
the extremes, i.e, there exists a node in the intersection of the sets J; (te +1) and 
the set s which contains nodes outside the set J; (t. -- DNA (te +1, e). We prove 
the existence of such a node using the following lemma statement in Coq 


Lemma exists, in intersection: V (A B: (set D}) (s: seq D) (F:nat), 
|s| = (F+1)%N — ( |B| € F)%N > 
{subset s <= A — B} > J x:D, x € [set x | x € s] N A. 


We instantiate the set A with J;\Ri(t) and the set B with RZ (t). We know 
that by definition of the W-MSR algorithm, [R7 (t)| < F. To use the lemma 
exists in intersection, we first had to prove that s C (J;VR;(t)) U Rz (t). 
Applying the lemma exists. in intersection then gives us a node k as a wit- 
ness which lies in the intersection of the set s and J;\R;(t). We use this node 
to apply the bound Ay — e; in the proof of inequality 1 for | < N. All other 
nodes in the neighboring list of the normal node 7 minus extremes are shown to 
be bounded by M(t). 

To show that the inequality 3t, M(t) < Am V Am < m(t) holds, we need to 
prove that for every l such that | € N, the cardinality of the set Y; decreases 
or the cardinality of the set Xm decreases or both under the (F + 1, F + 1)- 
robustness condition. This requires us proving the following lemma in Coq 


Lemma sj ind var (si s2: nat — nat) (N:nat): (0< N) > (s1 1 + s2 1 < N) > 

(V lmat, (0 < 1) —> (1 < N ) > (0< s1 1) > (0 < s2 1) 5 

(s1 1 < s1 1.—1) A (s2 1 < s2 1.—1) A (( s1 1 < s1 1.—1 ) V (s2 1 < s2 1.—1))) > 
JT:nat, (T < N) A (st T=0Vs2T=0) 


We instantiate s1 and s2 with Ay (te +1, ej) and Xm (te + l, €1) respectively. We 
use the lemma sj_ind_var to arrive at a contradiction and complete the proof 
of the sufficiency. 


3.3 Proof of necessity 


Lemma 3. [30] Consider a time-invariant network modeled by a digraph D — 
(V, £) where each normal node updates its value according to the W-MSR algo- 
rithm with parameter F. Under the F-total malicious model, if resilient asymp- 
totic consensus is achieved then the network is (F--1, F+1)-robust. 
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Necessity is a secondary, but still significant lemma. It tells us that there is no 
weaker condition than (F + 1, F + 1)-robustness such that the normal nodes 
within the network reach asymptotic consensus. We now discuss an informal 
proof of Lemma 3. Note that the original paper [30] does not provide a clean proof 
of this lemma. For example, the original paper provides a sketch of the proof of 
Lemma 3 by contrapositivity, but does not provide a concrete counterexample to 
discharge the proof by contrapositive. The paper proof in [30] does not talk about 
construction of weights or the proof that these weights are not well-behaved 
under non-(r, s)-robustness. These issues were non-trivial and posed challenges 
in Coq, as will be explained in this section. We also highlight challenges in the 
construction of this counterexample and the proof of necessity in Coq, including 
an issue of mutual recursion in Coq. The issues with missing details in the 
original paper proof, which we had to develop explicitly, make the proof in this 
paper an original contribution. 


Proof. We proceed by proving the contrapositive of necessity, that is: if the 
network is not (F +1, F -- 1) robust then it does not achieve resilient asymptotic 
consensus. Assuming that the network is not (F +1, F + 1)-robust we know that 
there are non-empty sets $;,55 C V, such that Sı N $5 = 0, xg" | Æ |9i|, 

lees | Æ |S5|, and Ix | + eer | < F +1. It follows that ett < F+1, 

and Ix; | < F +1. Also recall that rod C Sı, and xot C S2. One way of 
interpreting this condition is that the number of nodes within Sı and S» that 
can receive a lot of information from outside of their respective sets is less than 
F+1 in total, and less than the number of nodes in each set respectively. We seek 
to construct a set of adversaries, initial values, malicious functions, and weights 
such that resilient asymptotic consensus is not achieved. In particular we seek to 
prove that there exists two normal nodes i, j such that jm xi(t) A jim x; (t). 


We discuss the details of the proof in the Appendix D of the extended version [45]. 


Formalization in Coq: We formalize the lemma 3 in Coq as 


Lemma necessity_proof: 
nonempty_nontrivial_graph — 
(^ r_s_robustness (F + 1) (F + 1) > 
—(V (A:D > bool) (mal:nat — D > R) (init:D — R) (wnat > D * D >R), 
wts well behaved A mal init w — 
Resilient asymptotic, consensus A mal init w)). 


Formalization of necessity proof exposed some inconsistencies in definitions 
in the original paper [30]. In particular, the paper defines those three conditions 
on weights, that we discussed in the Section 2.4, only for normal nodes. During 
our formalization, we found this to be restrictive. Those conditions on weights 
should hold for any node. The need for applying the conditions in the paper to 
the weights of adversary nodes, is that in order to ensure that a node i € .A 
is malicious, as defined in the paper, there must exist a time t such that the 
quantity z;(t + 1) Z 3 es a, (i Wis (f)25(t). In other words at some time the 
value emitted by a given node must not equal the value it would emit if it was 
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normal, but the sum is clearly undefined if the weights of an adversary node are 
undefined. Therefore, we relax the condition that the set of weights described 
in the paper only exists for normal nodes. Fortunately this does not create a 
problem as adversary nodes can update their values according to any function 
they wish, meaning that they do not have to use the described set of weights, or 
any weights at all, leaving their values unconstrained by this condition. 

Another thing that was not explicit in the original paper [30] was the right 
placement of quantifiers. Formalizing the proof of necessity helped us identify 
the right placement of quantifiers and provide an accurate formal specification 
for the W-MSR algorithm. At the start of our formalization it was not evidently 
clear to us whether the paper meant to imply that: 


(V (A:D — bool) (mal:nat — D > R) (init:D > R), wts. well behaved A mal init > 
(Resilient, asymptotic. consensus A mal init + r.s robustness (F + 1) (F + 1))). 


Or: 


(V (A:D > bool) (mal:nat — D > R) (init:D > R), 
wts well behaved A mal init — 
Resilient asymptotic. consensus A mal init) € r. s robustness (F + 1) (F + 1). 


In the first formula, the quantified values A, mal, init are not bound to the 
definition of resilient asymptotic consensus. Therefore, in the necessity proof, 
we cannot construct a counterexample by appropriate instantiation of A, mal 
and init, to discharge the proof by contradiction. In the second formula, the 
quantified values are bound to the definition of resilient asymptotic consensus, 
which allows us to construct the counterexample by propagating the negation 
through the quantified values. Essentially, the difference is between the formulae 
(VX, P(X) > Q(X)) and ((VX. P(X)) > (VX. Q(X))), where X represents the 
tuple (A, mal, init), and the first statement is stronger. Therefore, the former, 
stronger condition is not necessarily true in the necessity direction, while the 
weaker later condition is. 

Another difficulty we encountered was defining the weights in such a way 
that w;;(t) = WAR This is a result of Coq's sensitivity to ill-defined recursion. 
The issue arises because defining wj; at time t requires knowing the value of x; 
at time t, however, as we had defined z;, it takes the set of weights it uses as a 
parameter, even though mathematically there is no issue since x(t) only relies 
on the values of z;(t — 1), and wj;(t — 1). In order to solve this issue we defined 
a function which returns a pair of functions (x;, wij). In order to ensure that 
Coq could guess the parameter being recursed on we also had to add another 
parameter two; which is initialized as 2-t, and ensure that the pair (z;(t), wi;(t)) 
is returned when two; = 2-t, and (x41, wi;(t)) is returned when two, = (2-t)+1. 


3.4 Formal proof of the main theorem 


We state the main theorem statement 1 in Coq as: 


Theorem F. total. consensus: 
nonempty nontrivial graph — 
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(0 < F+1 x |D|)%N > 

(V (A:D > bool) (mal:nat — D > R) (init:D > R) (w:nat — D « D >R), 

wts well behaved A mal init w — 

Resilient asymptotic. consensus A mal init w) €» r. s robustness (F + 1) (F + 1). 


We close the proof of F.total consensus by splitting the theorem into suffi- 
ciency and necessity sub-proofs and applying the lemmas sufficiency_proof 
and necessity proof. The only detail worth noting is that necessity proof 
relies on the decidable of r.s robustness, which we need the axiom of the ex- 
cluded middle to conclude. 


4 Related Work 


Recently there has been a growing interest in the formalization of distributed 
systems and control theory, using both automated and interactive verification 
approaches. 

Some notable works in the area of automated verification use model checking, 
temporal logic, and reachability techniques. For instance, Cimatti et al. [11] have 
used model checking techniques to formally verify the implementation of a part 
of safety logic for railway interlocking system. Schrer et al. [43] extended the 
JavaPathFinder [24] model checker to support modeling of a real-time sched- 
uler and physical system that are defined by differential equations. They ver- 
ify the safety and liveness properties of a control system, and also verify the 
programming errors. Besides model checking, temporal logic based techniques 
have been applied to control synthesis [40], robust model predictive control [14] 
and automatic verification of sequential control systems [35]. Other approaches 
for verifying safety use reachability methods like flow pipe approximations [10], 
zonotope approximation algorithms [19, 28, 2], and ellipsoidal calculus [4]. 

There has also been significant work in the formalization of control theory 
using interactive theorem provers [39, 1, 38]. In the area of formalization of sta- 
bility analysis for control theory, Cyril Cohen and Damien Rouhling formalized 
the LaSalle's principle in Coq [12]. Stability is important for the control of dy- 
namical systems since it guarantees that trajectories of dynamical systems like 
cars and airplanes, are bounded. Chan et al. [7] formalize safety properties like 
Lyapunov stability and exponential stability of cyber-physical systems, in Coq. 
In [39], Damien Rouhling formalized the soundness of a control function [32] 
for an inverted pendulum. Some works have also emerged in the area of signal 
processing for controls. Gallois-Wang et al. [17] formalized some error analysis 
theorems about digital filters in Coq. Araiza-Illan et al. [3] formally verified high 
level properties of control systems such stability, feedback gain, or robustness us- 
ing the Why3 tool [15]. Rashid et al. [38] formalized the transform methods in 
HOL-Light [22]. Transform methods are used in signal processing and controls 
to switch between the time domain and the frequency domains for design and 
analysis of control systems. A few works have emerged in the area of formaliza- 
tion of the feedback control theory to guarantee robustness of control systems. 
Jasim and Veres et al [26] proved one of the most fundamental and general 
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result of nonlinear feedback system - the Small-gain theorem (SGT), formally 
using Isabelle/HOL [37]. Hasan et al [23] formalized the theoretical foundations 
of feedback controls in HOL Light. Another notable work in the formalization of 
control systems is the formalization of safety properties of robot manipulators 
by Affeldt et al. [1]. 

Most of the above works deal with the problem of formalizing the theoretic 
foundations of control theory — stability analysis, transform methods, filtering 
algorithms for signal processing, feedback control design. But, to our knowledge, 
none of these works tackles the problem of consensus in a formal setting. Given 
that consensus is a quantity of interest in distributed control applications, our 
work on the formalization of the W-MSR algorithm, is a first step towards 
formally verified distributed control systems. 


5 Conclusion 


In this work, we formalize a consensus algorithm [30] for distributed controls 
in Coq. We formally prove the necessary and sufficient conditions for a set of 
normal nodes in the network to achieve asymptotic consensus in the presence of 
a fix bound of malicious nodes in the network. During the process of formaliza- 
tion we discover several areas where the proof in the original paper is imprecise, 
especially when defining the lemma statements of sufficiency and necessity. In 
particular, the order of quantifiers on some variables was unclear, and we had to 
spend time clarifying their order. We also prove a stronger version of the suffi- 
ciency condition than the original theorem requires. This is done to ensure that 
the conditions in both directions of the double implication holds. The definitions 
and lemmas we formalize in this paper can be used for verifying consensus for 
other threat models described in the original paper [30]. Overall our work is 
a first of its kind to provide formal specifications of a consensus algorithm in 
distributed controls. T'he total length of Coq proofs is about 11 thousand lines 
of code. It took us 6 person months for the entire formalization. 

A possible future direction of work is to verify the implementation of the 
algorithm. The proof of this algorithm in the original paper [30], and our for- 
malization assume that all computations are in the real field. However, an actual 
implementation would need to use finite precision arithmetic. It would therefore 
be interesting to study the effect of finite precision on the robustness of this al- 
gorithm. It would also be interesting to formalize the algorithm for time-variant 
networks in which the edge relation between the nodes can change with time. 
Possible use cases for such network model are drone swarms for military and 
rescue operations, in which each drone in the network could be expected to 
dynamically change the flow of information from its neighbors. 
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Abstract. In this experience report, we present the complete formal 
verification of a Java implementation of inplace superscalar sample sort 
(ips*o) using the KeY program verification system. As ips‘o is one of 
the fastest general purpose sorting algorithms, this is an important step 
towards a collection of basic toolbox components that are both provably 
correct and highly efficient. At the same time, it is an important case 
study of how careful, highly efficient implementations of complicated 
algorithms can be formally verified directly. We provide an analysis of 
which features of the KeY system and its verification calculus are in- 
strumental in enabling algorithm verification without any compromise 
on algorithm efficiency. 


1 Introduction 


The core task of computer scientists can be seen as writing correct and efficient 
computer programs. However, although both correctness and efficiency have been 
intensively studied, there is comparably little work on fully combining both fea- 
tures. We would like formally verified code that is efficient on modern machines. 
We believe that a library of verified high-performance implementations of the 
basic toolbox of most frequently used algorithms and data structures is a cru- 
cial step towards this goal: often, these components take a considerable part of 
the overall computation time, and they have a simple specification which allows 
reusing their verified functionality in a large number of programs. Since the re- 
maining code may be simpler from an algorithmic point of view, verifying such 
programs could thus be considerably simplified. 

To make progress in this direction, we perform a case study on sorting, which 
is one of the most frequently used basic toolbox algorithms. For example, a 
recent study identified hundreds of performance relevant sorting calls in Google's 
central software depot [B6]. Taking correctness of even standard library routines 
for granted is also not an option. For example, during a verification attempt of 
the built-in sorting routine of the OpenJDK TimSort routine, researchers were 
able to detect a bug, using the KeY verifier [11]. 

Although some sorters have been formally verified [A420], it turns out that 
these do not achieve state-of-the-art performance because only rather simple 
combinations and variants of quicksort, mergesort, or heapsort have been used 


(9 The Author(s) 2024 
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that lack cache efficiency when applied to large data sets and have performance 
bottlenecks that limit instruction parallelism. The best available sorters are con- 
siderably more complex (~ 1000 lines of code) and even more likely to contain 
bugs when not formally verified. Moreover, previous verifications do not prove 
all required properties or they operate only on an abstraction of the code, which 
makes it difficult to relate to highly tuned implementations. 

For our verification of a state-of-the-art sorter, we consider ips^o (in-place 
super scalar sample sort) [2]. Sample sort generalises quicksort by parti- 
tioning the data into many pieces in a single pass over the data, which makes it 
more cache efficient (indeed I/O-optimal up to lower order terms). Additionally, 
ips*o works in-place (an important requirement for standard libraries and large 
inputs), avoids branch mispredictions, and allows high instruction parallelism 
by reducing data dependencies in the innermost loops. The algorithm also has 
an efficient parallelisation and parts of it can be used for fast integer sorting 
[2/36]. Extensive experiments indicate that a C++ implementation of ips*o con- 
siderably outperforms quicksort, mergesort and heapsort on large inputs and is 
several times faster than adaptive sorters such as TimSort on inputs that are not 
already almost sorted [2]. Our experiments in (Sec. 5] indicate that the verified 
Java implementation is 1.3 to 1.8 times faster than the standard library sorter 
of OpenJDK 20 for large inputs on three different architectures. 

We use the Java Modeling Language (JML) to directly specify the effi- 
cient Java implementation of sequential ipsto. We obtain a largely automated 
proof using the KeY theorem prover [I] in part aided by external theory solvers 
(in particular Z3 [33]) and KeY's support for interactively guiding the proof con- 
struction process. This yields a full functional correctness proof of the full Java 
implementation of ipso showing, for all possible inputs, sortedness, the permu- 
tation property, exception safety, memory safety, termination, and absence of 
arithmetic overflows. The complete 8-line specification of the toplevel sorting 
method can be seen in [Fig. 1] 

The verified code is available for download!] and can easily be used in real- 
world Java applications (through the maven packaging mechanism). It spans 
over 900 lines of Java code with the main properties specified on 8 lines of 
JML, annotated with some 2500 lines of JML auxiliary annotations for prover 
guidance. The project required a total of 1 million proof steps (of which 4000 
were performed manually) on 179 proof obligations (with one or more proof 
obligation per Java method). The project required about 4 person months. 

The verification revealed a subtle bug in the original version, where the algo- 
rithm would not terminate if presented with an array containing the same single 
value many times|?| This flaw was subsequently fixed. Moreover, the formal ver- 
ification revealed that the code could be simplified at one point. 

This case study demonstrates that competitive code hand-optimised for the 
application on modern processors can be deductively verified within a reason- 


! at the github repository |https://github.com/KeYProject / pete 


? The bug was latently present in the original C++-code also. However, it cannot 
occur when the default parameter values are used in C++. 
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able time frame. It resulted from a fruitful collaboration of experts in program 
verification and experts in algorithm engineering. An extended version of this 
paper [3] is available containing more in-depth information about the specifica- 
tion and verification. 


2 Background 


2.1 Formal Specification with the Java Modeling Language 


The Java Modeling Language (JML) is a behavioural interface specifica- 
tion language [15] following the paradigm of design-by-contract [29]. JML is 
the de-facto standard for the formal specification of Java programs. The main 
artefact of JML specifications are method contracts comprised of preconditions 
(specified via requires clauses), postconditions (ensures) and a frame condi- 
tion (assignable) which describes the set of heap locations to which a method 
invocation is allowed to write. A contract specifies that, if a method starts in 
a state satisfying its preconditions, then it must terminate and the postcondi- 
tion must be satisfied in the post-state of the method invocation. Additionally, 
any modified heap location already allocated at invocation time must lie within 
the specified assignable clause. Termination witnesses (measured_by clauses) are 
used to reason about the termination of recursive methods. Java loops can be 
annotated with invariants (loop_invariant), which must be true whenever the 
loop condition is evaluated, termination witnesses (decreases), and frame con- 
ditions (assignable) that limit the heap locations the loop body may modify. 
Loop specifications and method contracts of internal methods allow one to con- 
duct proofs modularly and inductively. 

Expressions in JML are a superset of side-effect-free Java expressions. In 
particular, JML allows the use of field references and the invocation of pure 
methods in specifications. JML-specific syntax includes first-order quantifiers 
(\forall and \exists) and generalised quantifiers. One generalised quantifier 
is the construct (\num_of T' x; q) which evaluates to the number of elements 
of type T that satisfy the condition vy (if that number is finite). (\sum T x; v; 
e) sums the expression e over all values of type T satisfying y. Quantifiers in 
JML support range predicates to constrain the bound variable; the expression 
(\forall T x; o; wv) is hence equivalent to (\forall T x; qo ==> y). 

JML specifications are annotated in the Java source code directly and en- 
closed in special comments beginning with /*@ or //@ to allow them to be com- 
piled by a standard Java compiler. JML supports the definition of verification- 
only (model and ghost) entities within JML comments that are only visible at 
verification time and do not influence runtime behaviour (see also [Sec. 4.1). 

[Fig. 1| shows the specification of the top-level sort method as an example. 
Since that JML contract is labelled normal behaviour, it requires (in addition 
to satisfying the pre-post contract) that the method does not terminate abruptly 
by throwing an exception. 
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1 /*6 public normal behaviour 

2 @ requires v.length <= MAX LEN; 

3 @ ensures seqPerm(array2seq(v), \old(array2seq(v))); 

4 @ ensures (\forall int i; 0 <= i < v.length-1; v[i] <= v[i*1]); 
5 @ assignable v[*]; 

6 Qx/ 

7 public static void sort(int[] v) { ... } 


Fig. 1: Specification of the sorting entry method specifying that after the method 
call, the array values contains a permutation of the input values (line |3] and 
is sorted (quantified expression in line [4). Only entries in the array are modified 
in the process (line b). 


2.2 Deductive Verification with the KeY System 


The KeY verification tool |1| is a deductive theorem prover which can be used to 
verify Java programs against JML specifications. KeY translates JML specifica- 
tions into proof obligations formalised in the dynamic logic [I3] variant JavaDL, 
in which Java program fragments can occur within formulas. The JavaDL for- 
mula y — (o.m();)z is similar to the total Hoare triple [v] o.mO ; [v], with 
both stating that the method invocation o.m() terminates in a state satisfying 
w if started in a state satisfying y. Proofs in KeY are conducted by apply- 
ing inference rules in a sequent calculus. Using a set of inference rules for Java 
statements, the Java code ((o.mO)v in the above statement) is symbolically 
executed such that the approach yields the weakest precondition for o.m() and 
v as a formula in first-order predicate logic. KeY can settle many proof obliga- 
tions automatically, but also allows interactive rule application and invocation 
of external provers like satisfiability modulo theories (SMT) solvers. 


3 Our Java Implementation of ips*o 


3.1 The Algorithm 


In-place (parallel) super scalar sample sort (ips^o), is a state-of-the-art general 
sorting algorithm [2]. Sample sorting can be seen as a generalisation of quick sort, 
where instead of choosing a single pivot to partition elements into two parts, we 
choose a sorted sequence of k — 1 splitters which define k buckets consisting 
of the elements lying between adjacent splitters. One advantage of this is the 
reduced recursion depth and the resulting better cache efficiency. ^Super-scalar" 
refers to enabling instruction parallelism by avoiding branches and reducing data 
dependencies while classifying elements into buckets. ^In-place" means that the 
algorithm needs only logarithmic space in addition to the input. Although ips*o 
has a parallel version, this work is concerned with the sequential case. 

The algorithm works by recursively partitioning the input into buckets; when 
the sub-problems are small enough, they are sorted using insertion sort. The 
maximum number of buckets kmax and the base-case size, i.e., the maximum 
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Buffers 


Fig. 2: Overview of all steps of ips*o: (a) input with elements classifying as the 
four classes blue, green, orange and red, (b) After classification (B — 2); bucket 
sizes are indicated by brackets and white elements are empty, (c) after permu- 
tation, (d) the operations done by the cleanup step, (e) partitioned output. 


problem size for insertion sort, are configuration parameters. In our implemen- 
tation, we chose kmax = 256 and base-case size 128 experimentally. Partitioning 
consists of four steps: Sampling, classification, permutation, and cleanup. 

Sampling. This step finds the splitters as equally spaced elements from a 
(recursively) sorted random sample of the current subproblem. There are special 
cases to handle small or skewed inputs. These are fully handled in our proof, 
but to simplify the exposition, we will assume in this summary that k = kmax 
distinc ib splitters are found this way. 

Classification. The goal of the classification step is two-fold: (1) to assign 
each element to one of the k buckets defined by the splitters, and (2) to pre- 
sort elements into fixed-size blocks such that all elements in a block belong to 
the same bucket. To find the right bucket for each element, the largest splitter 
element smaller than that element must be identified. A number of algorithm en- 
gineering optimisations make the classification efficient: it is implemented using 
an implicit perfect binary search tree with logarithmic lookup complexity. More- 
over, the tree data structure also supports an implementation without branching 
statements and unrolled loops that eliminates branch mispredictions and facili- 
tates high instruction parallelism and the use of SIMD instruction. We will come 
back to this classification tree implementation in Sect. 4-2! where we discuss how 
this efficiency choice was dealt with in the formal proof. 

After classification is done, the input array consists of blocks in which all 
elements belong to the same bucket, followed by some empty space, with the 
remaining elements still remaining in th ially filled) buffers. The block size 
B is chosen experimentally to be 1 EE shows the output of this step. 

Permutation. By now, it is known how many elements are in each bucket, 
and therefore where in the array each bucket begins and ends after partitioning 
is done. The objective of the permutation step is to rearrange the blocks so that 


3 Tf equal splitters do appear, duplicates are removed and equality buckets are u 
that do not require recursive sorting. Details can be found in the extended version [3]. 
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each block starts in the correct bucket. Then, if the block is not already correctly 
placed, it is moved to its bucket, possibly displacing another (incorrectly placed) 
block, which is then similarly moved. Refer to[Fig. 2]c for the state of the input 
array after this step. 

Cleanup. In general, bucket boundaries will not coincide with block bound- 
aries. Since the permutation step works on block granularity, there may be over- 
lap where elements spill into an adjacent bucket. These elements are corrected 
in the cleanup step. In addition, the remaining elements in the buffers from 
the classification step are written back into the input array. Fig. 2]d shows an 
example of the steps performed during cleanup. 


3.2 Algorithm Engineering for Java 


While the original implementation of ips*o was written in C++, the verification 
target of this case study is a translation by one of the authors of the original 
code to Java. No performance-relevant compromises where made, e.g., to achieve 
easier verification. We started with a Java implementation as close as possible to 
the C++ implementation. We then performed profiling-driven tuning. Adjusting 
configuration parameters improved performance by 1296. The only algorithmi- 
cally significant change resulting from tuning is when small sub-problems are 
sorted. In the C++ implementation this is done during cleanup in order to im- 
prove cache locality. In Java it turned out to be better to remove this special case, 
i.e., to sort all sub-problems in the recursion step. This improved performance 
by a further 496. 


4 Specification and Verification 


In this case study, the following properties of the Java ips^o implementation 
have been specified and successfully verified: 


Sorting Property: The array is sorted after the method invocation. 

Permutation Property: The content of the input array after sorting is a per- 
mutation of the initial content. 

Exception Safety: No uncaught exceptions are thrown. 

Memory Safety: The implementation does not modify any previously allo- 
cated memory location except the entries of the input array. 

Termination: Every method invocation terminates. 

Absence of Overflows: During the execution of the method, no integer oper- 
ation will overflow or underflow. 


We assume that no out-of-memory or stack-overflow errors can ever occur at 
runtime. Since the algorithm is in-place, and the recursion depth is in O(log n), 
this is a reasonable assumption to make. 

shows the JML specification of the entry method sort of the ips*o 
implementation, i.e., the top-level requirements specification of the sorting algo- 
rithm. The annotation normal behaviour in line[1]specifies exception safety (i.e. 
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the absence of both explicitly and implicitly thrown uncaught exceptions). Mem- 
ory safety is required by the framing condition in line |5} The permutation and 
sorting property are formulated as postconditions in lines B]resp. [4] Termination 
is a default specification case with JML (unless explicitly specified otherwise). 
The absence of overflows is not specified in JML, but is an option that can be 
switched on in KeY. The precondition in line 22] of the method contract ensures 
that there are no overflows and is of little practical restriction since it is very 
close to the maximum integer value (MAX.LEN = 2?! — 256). 

The implementation of Java ipsto comprises 900 lines of code, annotated 
with 2500 lines in JML. Besides the requirement specification, this comprises 
auxiliary specifications such as method contracts for (sub-)methods, class and 
loop invariants, function or predicate definitions and lemmata. We will focus 
on selected specification items and emphasise the algorithm's classification step 
since it has sophisticated, interesting loop invariants that are at the same time 
comprehensible, exemplifying the techniques we were using. 


4.1 Enabling KeY Features 


A few advanced features of KeY were essential for completing the proof. They 
are needed to abstract from sophisticated algorithmic concepts and to decompose 
larger proofs into more manageable units. 

We followed a mostly autoactive program verification approach [25] with as 
much automation as possible while supporting interactive prover guidance in 
form of source code annotations (e.g. assertions). This concept has been widely 
adapted throughout the program verification community [35[31]24[9]. Most pro- 
gram verification tools only allow guidance by source code annotations. However, 
the KeY theorem prover also supports an interactive proof mode in which infer- 
ence rules can be applied manually — and we resorted also to this way of proof 
construction where needed. 

Model methods. Due to the scale of the project, it was useful to encapsu- 
late important properties of the data structures into named abstract predicates 
or functions. The vehicle to formulate such abstraction in JML are model meth- 
ods [32], which are side-effect free (pure) methods defined within JML annota- 
tions. For ips*o, around 100 different model methods were used. 

The benefits of using model methods are two-fold: (1) They structure and 
decompose specifications making them more comprehensible and (2) they sim- 
plify resp. enable automated verification by abstraction of the proof state. An 
example for a widely used (50 occurrences) model method is shown in 

Ghost fields and variables provide further abstractions from the memory 
state by defining verification-only memory locations. In the present case study, all 
Java classes except simple pure data containers required at least one ghost field. 
[Sec. 4.2]reports a challenge were ghost variables and ghost code (i.e. assignments 
to ghost variables) made verification possible in the first place. 

Assertions are the main proof-guidance tool in autoactive verification as 
they provide means to formulate intermediate proof targets that the automa- 
tion can discharge more easily and that thus may provide a deductive chain 
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1 /*@ public model behaviour 

2 @ accessible values[begin..end - 1]; 

3 0 static model int countElement(int[] values, int begin, int end, int e) 
4 Q return (\num_of int i; begin <= i < end; values[i] == e); } */ 


Fig. 3: Model method that counts the occurrences of the integer element in the 
index range begin,...,end— 1. The accessible clause specifies that the model 
method may only read the values between begin and end-1 (inclusively). 


Written slice Empty Elements to read Buffers 
p——————ÁÀ pM 3 
1. Flush 


2. Push 


Fig.4: Intermediate state of the classification step after processing some ele- 
ments. The first element to be read is being pushed to the orange buffer which 
gets flushed beforehand. 


completing the proof. This corresponds to making case distinctions or to intro- 
ducing intermediate goals in a manual proof. In the present case study, assertions 
avoided many tedious interactive proof steps as the annotations in the source 
code guide the proof search such that it now runs automatically. 

Block contracts. Much like method contracts, block contracts abstract from 
details in control flow and implementation details of a Java code block they an- 
notate (similar to a method contract). Block contracts can decompose large and 
complex method implementation and allow one to focus on the relevant effects 
of individual components (i.e., code blocks) formalised in the postconditions of 
the block contracts. 


4.2 Central Ideas Used in the Proofs of the Steps of ips*o 


In this section we zoom in on a few central concepts from the proofs of the algo- 
rithm. We mainly focus on the classification step which (1) establishes the most 
relevant invariants of the recursion step, and (2) showcases a particular proof 
technique related to the verification of the efficient algorithm implementation 
used in this case study. 

Relevant Invariants. During classification, the algorithm rearranges the 
input elements into blocks (of a given size B) such that all elements in a block 
are classified into the same bucket. Furthermore, it counts the elements in each 
bucket. [Fig. 4|shows an intermediate state of the classification step. It is checked 
to which bucket the next element belongs, that bucket's buffer is flushed if 
needed, and then the element is pushed to the buffer according to its classifica- 
tion. This is done in batches of m elements at once such that the classification 
can take advantage of batched queries (that allow the CPU to apply instruction 
parallelism). 
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/*@ loop invariant begin <= i <= end && begin <= write <= i; 
loop invariant (\forall int b; 0 <= b < num buckets; (\forall int i; 
b * BUFFER SIZE <= i < b * BUFFER SIZE + buffers.lengths [b] ; 
classOf (buffers.buffer[i]) == b)); 
loop invariant (\forall int block; 0 <= block < (end-begin) /BUFFER_SIZE; 
(\exists int b; 0 <= b < num buckets; (\forall int i; 
begin + block * BUFFER SIZE <= i < begin + (block+1) *BUFFER_SIZE; 
classOf(values[i]) == b))); 
loop_invariant (\forall int element; 
\old(countElement (values, begin, begin, begin, end, buffers, element)) 
countElement(values, begin, write, i, end, buffers, element) 


loop invariant (\forall int b; 0 <= b < num buckets; bucket counts[b] == 
(aum of int i; begin <= i < write; classOf(values[i]) == b)); 
loop invariant write - begin == (\sum int b; 


O <= b < num buckets; bucket counts[b]); 

loop invariant (\forall int b; O <= b < num buckets; 
isValidBufferLen(buffers.lengths[b], bucket counts[b])); 
loop invariant buffers.count() -- i - write; 

loop invariant (i - begin) loop invariant (write - begin) 


// 


// 


// 


dd 


// 


// 


// 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7a) 


Fig. 5: Specification of the classification loop. begin and end are the boundaries 
of the slice that is being processed, i is the offset of the next element that will be 
classified, write is the end offset of the written slice. The array bucket_counts 
contains the element count for each bucket. 


1. 
2. 


3. 


> 


After classifying all elements, the count of all elements in each bucket’s buffer 
is added to get the full element count for each bucket. We define the written slice 
to be the elements that were already flushed to the input array. 
To exemplify the nature of the specification used in this case study, we discuss 
the inductive loop invariants of the classification loop which allowed us to close 
the proof for this step. [Fig. 5] shows the corresponding JML annotationd/] 


'The buffers contain only bucket elements of their respective bucket. 
'The written slice is made up of blocks of size B where each block contains 


only elements of exactly one bucket. 
'The permutation property is maintained. 


'The per bucket element counts are exactly the number of elements of the 


corresponding bucket in the written slice. 


'The sum of all per bucket element counts equals the size of the written slice. 


The buffer size of each bucket is valid. 
The spacings are well formed: 


(a) The total element count in all buffers equals the length of the free slice. 


(b) The start offset of the current batch is a multiple of m. 
(c) The length of the written slice is a multiple of B. 


Invariants [1] and 2] straightforwardly encode the block structure during clas- 
sification from the abstract algorithm. They are also needed as preconditions 


4 [n the actual implementation, the invariants are grouped in several model methods. 
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for the following partitioning step. The permutation invariant [3] ensures that no 
elements are lost during classification by stating that the original array content 
is a permutation of the union of all elements not yet handled, the written slice 
and the union of all buffers. Invariants |4| and [5| are needed to show that the 
bucket element counts are correct and to show that all elements of the input 
will have been taken into account eventually. These invariants were engineered 
by translating the ideas from the abstract algorithm into the Java situation. 
The remaining two invariants were discovered later in the verification process: 
The validity invariant [6| was only discovered during the proof of the cleanup step 
(where it becomes relevant). A buffer is called valid, if (1) the number of elements 
written back during classification is a multiple of the block size B and (2) empty 
buffers are only allowed when nothing has yet been written back. Invariant 
was discovered last by inspecting the open proof goals of failed attempts, and is 
mostly needed to show that write operations to the heap remain in bounds. 


Invariant |5| while in principle derivable from the other invariants, simplifies 
the proof that the sum of all bucket element counts is the size of the input after 
termination. Adding it as a redundant loop invariant avoids having to prove the 
same statement repeatedly using the other invariants. 


When flushing a buffer, the algorithm must not overwrite the batch that it is 
currently processing nor the elements that were not processed yet. This property 
is captured in invariant |7| First and foremost, [7a] ensures that there is enough 
space to write a whole buffer if a buffer is full. When pushing the elements of 
the current batch to their buckets, the algorithm makes sure that the start of 
the batch will never be overwritten. However, this was not provable from the 
scope of this loop: For example, let there be B total elements in all buffers, all of 
which are in the buffer of some bucket b when we are trying to push the second 
element of a batch to b’s buffer. A flush may then happen before the push which 
would illegally overwrite the first element of the batch. This case is shown to be 
impossible by adding invariants [7b] and [7c] In general, this holds for any values 
where B is a multiple of the batch size m. 


Classification Search Tree. As mentioned in[Sec. 3] classification employs 
an implicit binary search-tree data structure to find the bucket to which an 
element belongs. This is a complete binary tree where the root of a subtree 
stores the median of the splitters belonging to the subtree. The splitters are 
stored in an array with the root at index one. The children of the node stored 
at index i are stored at indices 2i and 2i + 1. [Fig. 6] shows the branch-free loop 
to compute the bucket c(e) for an element e. 


It was difficult to verify this routine with hard to find loop invariants. On the 
other hand, an implementation using binary search on a linearly sorted array 
would have been easier to verify; but without the benefits of branch-freedom. 
Hence, this optimisation is an example where algorithm engineering decisions 
make verification more complicated. Our solution to the problem was to imple- 
ment the binary search algorithm on the array of indices in parallel next to the 
efficient tree search by means of ghost variables and ghost code. A set of coupling 
invariants set the variables of heap and array into relation. [Fig. 7]illustrates the 
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public int classify(int value) { 
int b = 1; 
for (int i = 0; i < log_2(k); ++i) 
b=2¥* b+ (tree[b] < value ? 1: 0); 
return b - k; 


} 


Oc Rob. 


Fig. 6: Classifying a single element without branches. The loop at line 3]can be 
unrolled, because log, k is at most 8. The conditional in line [4] can be compiled 
into predicated instructions, such as CMOV, or, more commonly, into a CMP/SETcc 
sequence, rending the code effectively branch-free. 
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Fig. 7: Visualisation of finding the classification for an element; in the binary 
heap search tree (left) and in a linearly sorted array (right) for k — 8 buckets. 
The red path indicates the same classification as a path on the heap tree and 
a nesting of intervals for the binary search. The circled numbers indicate the 
index in the array representing the search tree; the italic numbers show the 
bucket number and the upright numbers the index of the splitters against which 
is compared. 


relationship between the search in the binary heap and the search in the ghost 
code sorted index array. 

Besides Classification. The algorithm's initial step of drawing samples 
and determining the splitters to be used in the recursion step operates on a 
fixed number of elements such that most of the properties of this step can be 
shown by an exhaustive bounded analysid?] 'The permutation and cleanup steps 
build upon the same general principles already established during classification, 
but require more and additional book keeping to relate different indices into 
the array. The implementation consists of four quadruply nested loops and the 
innermost loop has three different exit paths. Hence, verifying the permutation 
and cleanup part needed the most proof rule applications to close. 


4.3 Selected Cross-cutting Concerns of the Proofs 


While constructing the correctness proofs for ipso, we made the following note- 
worthy observations. 


5 The extended version [3] elaborates on this. 
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Non-trivial termination proofs. For many algorithms, termination is an 
easy to show property. However, even though ips^o follows essentially an array- 
based divide-and-conquer strategy, its termination proofs are non-trivial. We 
exemplify this on the termination of the partitioning step. 

The textbook version of quicksort removes the splitter element (pivot) from 
the partitions. Hence, the partition size is a variant (termination witness) as 
each recursive call receives a strictly smaller slice to work on. For our ipso 
implementation, however, this is not the case as the splitter elements remain 
within the partitions. It is the following observation that ensures termination: 
If there are two elements ei,e9 in the input slice that are classified into two 
different buckets (c(e1) # c(e2)), then the number of elements in each bucket 
is strictly below the size of the input slice. While this observation may look 
trivial to a human reader, it requires a non-trivial interactive proof in KeY. One 
has to reason that for every bucket bi, there is a different non-empty bucket bə 
implying that bı is smaller than the input slice. This variant allows proving the 
termination of the recursion. 

Multiple variants of property formalisations. One important insight 
from the case study is that for some properties it pays off to have not one but two 
(or multiple) syntactically different, yet semantically equivalent formalisations 
at hand and to be able to use them at different places in the proofs. We give 
examples on sortedness and permutation properties. 

Sortedness of an array can be expressed in first-order logic by either of the 
following equivalent formulae: 

Vi: 0<i<n-—1 > vfi] < vli 4 1] (1) 

Vij: 0<ti<nAi<jg <n => vli] < vij] (2) 

While compares every array element with its successor, allows compar- 

ison between arbitrary indices in the array. In the case study, when proving 

sortedness, (1) is used. However, when assuming sortedness in a proof (e.g., in 

preconditions), the transitive representation is more useful. Technically, both 

representations are formulated as model methods and their equivalence has been 

shown using a simple inductive argument, which allowed us to switch between 
representations as needed. 

A similar effect with two formalisation variations can be observed for the 
permutation property: For two sequences 51, 52, the expression seqPerm(s;, s2) 
formulates that there exists a bijection m between the indices of s, and s2 such 
that si[r(i)| = səfi] for all indices i. This straightforward formulation of the 
property using an explicit permutation witness 7 proved helpful to show state- 
ments like $77 o si[i] = SO}, S2[i] under the assumption that s; and sz are 
permutations of one another. However, proving the permutation property using 
this definition can be difficult since one has to provide the explicit witness for 7. 
Therefore, an alternative formulation has been used based on the fact that two 
sequences are permutations of one another iff they are equal when considered 
as multisets, i.e., iff every element occurs equally often in both sequenced? The 
equivalence of the two notions is made available to KeY as an (proved) axiom. 


$ which is a standard formalisation often used in proofs of sorting algorithms 
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Proving frame conditions. To reason that the memory footprints of dif- 
ferent data structures do not overlap, KeY supports the concept of dynamic 
frames [I8]. To be cache-efficient, the ipso implementation uses a number of 
auxiliary buffers, realised as Java arrays. In the Java language, array variables 
may alias. In the case study, methods have up to 11 array parameters which all 
must not alias with each other. JML possesses an operator Ndisjoint which can 
be used to specify that the sets of memory locations provided as arguments must 
be disjoint. KeY then generates the (quadratically many) inequalities capturing 
the non-aliasing. KeY is not slowed down since all generated formulas are in- 
equalities between identifiers. We used an auxiliary class to group all arrays for 
reuse during the recursion which reduced the required specification overhead. 
This shows that dynamic frames are an adequate formalism to deal with the 
framing problem for this type of algorithmic verification challenge. 

Integer overflow. As mentioned above, KeY uses mathematical integers to 
model machine int values. For this to be sound, arithmetic expressions must not 
over- or underflow the ranges of their respective primitive type. We hence verified 
the absence of integer overflows in all methods proved in KeY. Corresponding 
assertions are automatically generated by KeY during symbolic execution: every 
arithmetic operation generates a new goal where the absence of overflow for 
this operation is checked. There were only a few lines of additional specification 
required. The overwhelming majority of those proofs closed without interactions 
since they could be derived from already proven invariants. 

Performance and Verifiability. Optimisations to the code in the case 
study sometimes had an impact on the required effort to verify and sometimes 
did not: verifying the binary search tree optimisation explained in [Sec. 4:2] was 
pretty costly whereas the reverification of the project after the optimisations 
mentioned in went through pretty automatically. Both optimisations 
bought a noteworthy bit of performance. A key factor for the complexity of the 
verification is how much the optimisation modifies data representation. 


4.4 Proof Statistics 


gives an overview of the size of the proofs in this case study. A rule 
application in the KeY system may be part of the symbolic execution of Java 
code, part of first-order or theory reasoning. 

The overall ratio between specification and source code lines is about 3:1, 
which since many model methods were declared, is still quite low. Using models 
methods to formulate lemmas deduplicating the proofs allowed us to obtain an 
overall proof with only 109 steps. Consider in comparison a recent case study 
performed with KeY: The numbers of branches and rule applications are in the 
same order of magnitude; but our case study has 6x as many the lines of code, 
and 7x as many lines of specification. However it also required twice the number 
of manual interactions. 

'The specification consists of 179 JML contracts of which 114 could be ver- 
ified with fewer than ten manual interactions. However, some methods require 
extensive interaction. Most interactions were needed to prove the contract of a 
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Table 1: Proof statistics: total number of rule applications, number of interactive 
rule applications, proof branches, branches closed by calls to an SMT solver, lines 
of Java code (LOC), lines of JML specification (LOS), ratio LOS/LOC. 


Class Rule apps Interactions Branches SMT LOC LOS LSS 
BucketPtrs 206348 683 585 24 48 441 9.19 
Buffers 47 258 120 291 0 44 175 3.98 
Classifier 265 743 TAT 1540 348 123 481 3.91 
Permute 160 431 1139 1104 272 130 413 3.18 
Cleanup 113 903 485 648 207 102 181 1.77 
Sorter 120079 519 105 7 93 382 4.11 
Other 215 629 724 742 44 249 430 1.73 


Total 1015 488 3 932 5615 789 9022503 3.17 


Table 2: Most common manual proof interactions in the largest proof (contract 
of Permute: : swap. block). 


Proof Step Count Proof Step Count 
Expanding model method 95 Expanding conditionals 64 
definitions First order equality reasoning 83 
Proof state simplification 71 Quantifier instantiation 53 
Memory footprint reasoning 69 Splitting if-then-else expressions 36 
Applying model method contracts 65 Case distinctions on equalities 35 


method wrapping an inner loop from the permutation stepwith 836 interactions 
and the cleanup method with 475. Those were also the biggest proofs for method 
contracts with about 125 000 and 110000 rule applications, respectively. With- 
out heavy usage of lemma methods, those proofs would have been multiple times 
larger. Notably, most of the interactions for constructing these proofs were un- 
packing model methods, using their contracts, simplifying the sequent and using 


observer dependencies, see |Table 2 


5 Performance of the ips*o Java Version 


As our stated goal is an implementation that is both verified and has state-of- 
the-art efficiency, we performed experiments to measure the performance of our 
Java implementation of ips^o. Our experimental setup is similar to that of the 
original ips^o paper [2] — in particular, we use all of the same input distributions 
in our evaluation: 


— UNIFORM: Values are pseudo-random numbers in [0, 2??]. 
ONES: All values are 1. 

— SORTED: Values are increasing. 

— REVERSED: Values are decreasing. 
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Fig. 8: Speedup of ips^o over Arrays.sort() for the UNIFORM distribution. 


— UNSORTED- TAIL: Like SORTED, except the last | /n| elements are shuffled. 

— ALMOST-SORTED: Like SORTED, except || random adjacent pairs are 
swapped. 

— EXPONENTIAL: Values are distributed exponentially. 

— RoorDu»r: Sets Afi] =i mod [n]. 

— TwoDur: Sets Ali] = i? +} mod m, where m = |log5 n]. 

— EIGHTDUP: Sets A[i] = i +% mod m, where m = |log; n]. 


We performed experiments using OpenJDK 20 on three different machi- 
nes/CPUs: An Intel i7 11700 at 4.8 GHz, an AMD Ryzen 3950X at 3.5 Ghz, 
and an Ampere Altra Q80-30 ARM processor at 3 GHz. We repeated each mea- 
surement multiple times and report the mean execution times of all iterations. 
For input sizes n < 213, we took 1000 measurements, for 214 < n < 2? we took 
25 measurements, and for 2?! < n < 230 we took 5 measurements. In addition, 
we repeated the entire benchmark 5 times to get results across different invoca- 
tions of the JVM. This means that there are between 25 and 5000 data points 
for each input size, distribution, and architecture. 

On all three machines, ipsto outperforms OpenJDK's Arrays.sort() for int 
by a factor of 1.33 to 1.83 for large inputs on the UNIFORM distribution. These 
results can be found in [Fig. 8| For comparison, [Fig. 9| shows the runtimes, in- 
cluding the C++ implementation of ips^o, on the Intel machine. 

Most other distributions show similar results (with a speedup factor of up 
to 2.27), with the exception of pre-sorted or almost sorted inputs. These dis- 
tributions — which include ONES, SORTED, REVERSED, and ALMOST-SORTED, 
but not UNSORTED-TAIL — are detected by the adaptive implementation of 
Arrays.sort() and are not actually sorted by the default dual-pivot quicksort, 
but by a specialised merging algorithm, which ends up doing almost no work on 
these distributions. 

In summary, our experiments show that the verified Java implementation of 
ips*o outperforms the standard dual-pivot quicksort algorithm across a variety 
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Fig.9: Runtime for the UNIFORM distribution on Intel. 


of input distributions and hardware. The same opportunistic merging algorithm 
currently implemented by Arrays.sort () could be used in conjunction with ips o, 
which would shortcut the work in case the input is already (almost) sorted. 


6 Related Work 


JML and KeY have been used previously to verify sorting algorithms. Besides the 
verifications of nontrivial proof-of-concept implementations like Counting Sort 
and Radixsort [12], KeY has been used to verify the sorting algorithms deployed 
with OpenJDK: The formal analysis with KeY revealed a relevant bug in the 
TimSort implementation shipped with the JDK as the standard algorithm for 
generic data types [I1]. A bugfix was proposed and it was shown that the fixed 
code does not throw exceptions (but sortedness or permutation were not shown). 
For the Dual Pivot Quicksort implementation of the JDK (used to sort arrays 
of primitive values), the sorting and permutation property were successfully 
specified and verified using KeY [4]. However, the complexity and size of those 
verification proofs are considerably smaller than our ipso case study. Other 
pivotal classes of the JDK were also successfully verified using KeY [5[16]. 

Lammich et al. verified efficient sorting routines by proving functional 
propertieson abstract high-level algorithmic descriptions in the Isabelle/HOL 
theorem prover and then refining them down to LLVM code. In that framework, 
even parallelised implementations can be analysed to some degree if no shared 
memory is used [21]. While the verified algorithms are on par with the perfor- 
mance of the standard library, they do not reach the efficiency of ips*o, and the 
authors explicitly list sample sorting as future work. Mohsen and Huisman [34] 
provide a general framework for the formal verification of swap-based sequential 
and parallel sorting routines, but restrict it to the analysis of the permutation 
property. Since ips*o is not entirely swap-based (due to the external buffers in 
the classification step), it is not covered by their approach. 


284 B.Beckert et al. 


'There exists a large number of prominent algorithm verification case studies 
that focus on the challenges provided by the verification and do not consider the 
performance of the implementation [B]7]28[1712716126]30]. 

Finally, there are several large-scale verification projects like the verified mi- 
crokernel L4.verified [19], the CertiOS framework [37] for the verification of pre- 
emptive OS kernels, or the verified Hypervisor Hyper-V that easily top this 
case study w.r.t. both verified lines of code and invested person years. However, 
they target a completely different type of system to be verified and have their 
focus on operating-system-related challenges, like handling concurrent low-level 
data structures or concurrent accesses to resources. While they also address sim- 
ilar performance questions, the algorithmic aspects are considerably different 


7 Conclusions and Future Work 


We have demonstrated that a state-of-the-art sorting algorithm like ipsto can be 
formally verified starting directly with an efficient implementation that has not 
been modified to ease verification. The involved effort of several person months 
was considerable but seems worthwhile for a widely used basic toolbox function 
with potential to become part of the standard library of important programming 
languages. Parts of this verification or at least the basic approach can be reused 
for related algorithms like radix sort, semisorting, aggregation, hash-join, random 
permutations, index construction etc. 

Future work could look at parallel versions of ips*o or implementations that 
use advanced features such as vector-instructions (e.g., as in [B6]). Of course, 
further basic toolbox components like collection classes (hash tables, search trees 
etc.) should also be considered. 

On the methodology side it would be interesting to compare our approach of 
direct verification with approaches that start from a verified abstraction of the 
actual code that is later refined to an implementation. Besides the required effort 
for verification and the efficiency of the resulting code, a comparison should also 
consider the ease of communicating with algorithm engineers, which on the one 
hand may benefit from an abstraction but on the other hand is easier when 
based on their original implementation. Our case study involved both experts in 
program verification and experts in algorithm engineering, which proved essential 
to its success. 

For much of the desirable future work, verification tools and methods need 
further development, in particular for efficient parallel programs and high-per- 
formance languages like C++ or Rust. It is also important to better support 
evolution of the implementation, since it is quite rare that one wants to keep 
an implementation over decades — algorithm libraries have to evolve with added 
functionality and changes in hardware, compilers or operating systems. 
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Abstract. Metric first-order temporal logic (MFOTL) is an expressive formal- 
ism for specifying temporal and data-dependent constraints on streams of time- 
stamped, data-carrying events. It serves as the specification language of several 
runtime monitors. These monitors input an MFOTL formula and an event stream 
prefix and output satisfying assignments to the formula's free variables. For com- 
plex formulas, it may be unclear why a certain assignment is output. We propose 
an approach that accompanies assignments with detailed explanations, in the form 
of proof trees. We develop a new monitor that outputs such explanations. Our 
tool incorporates a formally verified checker that certifies the explanations and a 
visualization that allows users to interactively explore and understand the outputs. 


1 Introduction 


Runtime monitoring is concerned with the analysis of events produced by a system during 
its execution. An online monitor searches for given complex patterns in event streams, 
processing the stream incrementally, i.e., one event at a time. If it finds a pattern match, 
the monitor outputs a verdict to its user. The nature of a verdict depends on both the 
monitor and its pattern specification language. For propositional specification languages, 
such as metric temporal logic (MTL) [612 T], typical verdicts are streams of Booleans [8| 
Ds B1. where each Boolean signifies the presence or the absence of a pattern match, i.e., 
the satisfaction or violation of the MTL formula at every position in the input stream. 

Users might find Boolean outputs difficult to interpret, especially when complex pat- 
terns like nesting temporal operators are involved. In particular, Boolean verdicts give no 
insight into how monitors produce them—we have to trust their correctness. Even when 
assuming infallible monitors, verdict justifications can help us to ensure that we expressed 
correctly our intentions in the specification and, e.g., that it is not vacuously true [23]. 

Lima et al. [25] propose the use of richer verdicts in an MTL monitor. Specifically, 
they use proof trees in a dedicated proof system resembling MTL’s semantics to explain 
why a formula is satisfied or violated. They develop the EXPLANATOR2 monitor, which 
outputs a stream of size-minimal proof trees, and design an interactive graphical user 
interface for exploring and understanding these informative verdicts. In addition, they 
formally verify, in the Isabelle/HOL proof assistant, a proof tree checker certifying that 
their proof system rules were correctly applied. Thus proof tree verdicts serve a two-fold 
purpose: as machine-checkable certificates and human-readable explanations. 

In this work, we significantly widen the scope of the "proof tree verdicts" approach. 
We provide certifiable and explainable monitoring verdicts for metric first-order temporal 
logic (MFOTL) [IA] with bounded future operators and without equality between vari- 
ables. MFOTL extends MTL with data parameters and first-order quantification and is an 
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expressive formalism with many practical applications (7]10113]. We extend Lima et al.'s 
MTL proof system to MFOTL with the expected rules for quantifiers (Section): e.g., the 
universally quantified formula Vx. a is satisfied if œ with x replaced by d is satisfied for 
all domain values d. The key challenge here is that the domain is typically infinite, which 
results in the above proof rule for V to be infinitely branching. This is problematic because 
it is unclear how to validate a correct application of the V rule in a proof tree checker. 

A crucial observation is that without equality between variables, proof trees cannot 
distinguish values outside of the active domain, i.e., the finite set of data values from 
the monitored event stream prefix and from the formula's constants. Thus, the active 
domain's size plus one bounds the number of choices for d requiring different proof 
trees, and we can reuse them-with the extra "plus one" representing values outside the 
active domain. Thus, to represent the V rule it suffices to store a finite partition of the 
domain and one subproof for each part. We obtain finite proof objects, develop a checker 
for them, and formally verify the checker's correctness in Isabelle/HOL (Section [}. 

The proof system explains how to deal with closed MFOTL formulas. A Boolean ver- 
dict for a formula with free variables only makes sense relative to a variable assignment. 
Hence, traditional MFOTL monitors compute sets of satisfying variable assignments 
instead of Boolean verdicts. In our setting, an explanation for a formula with free vari- 
ables must provide a proof tree for any variable assignment (satisfying or violating). For 
infinite domains, there are infinitely many assignments, but the same idea that worked for 
quantifiers comes to our rescue: it suffices to consider a finite partition of the domain for 
each variable. Inspired by binary decision diagrams (BDDs) [16]. we organize the parti- 
tions for different variables hierarchically in partitioned decision trees (PDTs). PDTs are 
trees where each leaf stores a generic data item and each node (representing a variable) 
branches on a finite partition of the domain (Section|4). The partitions may change from 
one node to the other. PDTs can be compacted (or reduced in BDD terminology). 

We thus have arrived at our notion of explainable verdicts for MFOTL formulas: 
PDTs whose leaves are proof objects. We extend our verified checker from proof objects 
to such verdicts and Lima et al.’s algorithm for MTL to MFOTL (Section|5). Our 
algorithm extension is modular in the sense that it merely adds a layer of PDTs, but keeps 
Lima et al.’s algorithms for temporal operators unchanged. We implement the extended 
algorithm in a new monitor and also extend Lima et al.’s interactive visualization of 
proof objects. We demonstrate the effectiveness of our new tool on MFOTL policies 
from the literature (Section|6). In summary, we make the following contributions: 


— We develop a proof system for MFOTL satisfaction and violation at a time-point for 
a given event stream and verify its soundness and completeness in Isabelle/HOL. 

— We finitely represent our proof system's proof trees and formally verify a checker 
for them. The key idea is that finite partitions of infinite domains are sufficient. 

— We design partitioned decision trees (PDTs) to represent functions from variable 
assignments to generic data items in a way that enables sharing and compression. 

— We develop an algorithm computing explanations: PDTs with proof objects as leaves. 
We implement the algorithm in a new monitor, along with an interactive visualization 
of explanations and integrated with the verified proof tree checker for certification. 


Our tool, called WHYMON, is publicly available [2]. 
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v,i F tt v,i E 3x.a iff v[x > d],i F æ for some d € D 

v, iK ff v,i F Vx.a iff v[x > d],i F æ for all d € D 

v,iF p(f) iff p(y) €T; v,iF @a iff i20, r;j—Tr; € I, andv,i-lFa 
viFxec iff v(x)=c v,iF Oza iff Tiy1 — Ti € I and v,i + 1 F & 
vibe iff v,i K æ v,i E ja iff v, j E£ for some j <i with r;—7;€ 1 
v,iFaAp iff v,i F æ and v,i F B|viF Ora iff v, j F B for some j > i with rj —7; € I 
viFaVvp iffviFaorv,iF B |vil- Mya iff v, j= 6 forall j i with t;— Tt; €I 
v, iF æ >P iff v,i Kæ orv, i Fp |v,iF Oye iff v, jE 6 forall j 2 i wih v;—T; €T 


v,i F a 6; B iffv, j F 8 for some j < i with t;— Tj € I and v,k F æ forall j< k <i 
v,i E a Ur B iff v, j F 8 for some j > i with v; — t; € I and v,k F æ foralli < k< j 


Fig. 1: Semantics of MFOTL for a fixed stream o = (T;, li) jen. 


Further Related Work Lima et als work [25], which we extend, is based on the work by 
Basin et al. [9] that employed proof trees as explanations in the context of understanding 
counterexamples of LTL model checkers. We refer to these works for a discussion of 
related proof systems for propositional temporal logics and regular expressions. 

In the first-order monitoring setting, we are on unexplored territory with verdicts that 
go beyond satisfying assignments. Nonetheless our work incorporates ideas from exist- 
ing first-order monitors. Most closely related is Havelund et al's DEJAVU monitor [18]. 
which uses BDDs to represent sets of satisfying assignments. Our work generalizes BDDs 
to branching over partitions of the domain and storing generic data (e.g., proof objects) 
instead of Booleans in the leaves. In addition, the DEJAVU authors make use of the fact 
that without equality between variables the formula's satisfaction cannot be influenced by 
different values outside the active domain. We generalize this observation so that not only 
the satisfaction but rather entire proof trees can be reused when exchanging values outside 
the active domain. Finally, DEJAVU only supports past temporal operators and closed for- 
mulas, whereas our algorithm supports both past and future operators and free variables. 

Havelund et al.'s key observation fails for equalities between variables. For example, 
the formula x z y — p(x, y) is satisfied for any pair of distinct values c Z d outside of the 
predicate p's interpretation, but it is violated if we pick the same value c for both x and y. 
A classic result by Ailamazyan et al. shows that for the relational calculus (MFOTL 
without temporal operators) it suffices to distinguish a finite number of equivalence 
classes of values outside of the active domain. While it is conceivable that this result 
generalizes to MFOTL with equality, we leave this generalization as future work. 

The MFOTL monitor MonPoly and its formally verified counterpart Veri- 
Mon output streams of satisfying assignments for formulas in the so-called mon- 
itorable fragment. The fragment ensures that all subformulas always evaluate to finite 
sets of satisfying assignments. Our monitor does not suffer from this limitation; even 
more it returns all satisfying and violating assignments (labeled and explained as such). 

Outside of first-order monitoring, our visualization takes some inspiration from the 
stream runtime verification tool TeSSLa [24]. which can provide output for all interme- 
diate streams. Similarly, we provide output for all subformulas, but our proof trees allow 
us to focus on the relevant dependencies between a formula and its subformulas. 


Metric first-order temporal logic (MFOTL) We recall MFOTUS syntax and semantics. 
We fix an infinite domain ID (e.g., containing integers and strings). Terms f € T are either 
variables x,y,z € V or constants c,d € ID. Overlines indicate lists (finite sequences), e.g., 
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if t is a term, then f is a list of terms. The grammar below specifies MFOTL’s syntax, 
where p € E is a predicate name (e.g., a string) and J € I C 2 is a non-empty interval. 


a z— tt|ff |p(r) | xec|5«|a^e|oava|a a|3x.a|Vx. a | 
8; | Ora | a | Ora | Ha | ra | a S; a | aUtg x 


Besides the first-order logic operators, the syntax includes the past @ (previous), @ (once), 
B (historically), S (since) and future © (next), > (eventually), O (always), U (until) 
temporal operators. We use A for universal and V for existential quantification at the met- 
alanguage level to avoid confusion with MFOTL formulas. We also use common interval 
notation [a,b) 2 ín |a Xn « b) or [a,c] 2 (n|a En € c}, fora,c € N and b € NU [eo], 
and omit intervals when J = [0,9») = N. Whenever we write [a,c], we exclusively denote 
the range [a,...,c] (rather than the two element sequence [a,c]). Furthermore, we assume 
that the future operators ($, L1, and Lf) intervals are finite (also called bounded). We 
write a +I for {a +x | x € I} and aRI for Ax € I. a/R.x (where R € {<,<,>,>}). We 
interpret formulas over streams o: infinite sequences of time-stamped sets of events o = 
(ri, T iien: We call the indices i € N time-points, so that T; is the set of events and t; € N 
is the time-stamp at time-point i. The time-stamps T; must be monotone (A ij. i < j — 
Tj € Tj) and eventually increasing (A T. V i. rj > 7). Bach event has the form p(di,....d,) 
where p is the event name and d; € D. Given a total assignment v mapping variables to val- 
ues in D, we define | x], = v(x) and [c], = c. The notation [t], = lifts this operation to 
lists of terms. We define the satisfaction relation v,i Fo œ in the usual way (Figure[1). Fi- 
nally, the earliest time-point ET Po (T) of t € N ona is the smallest time-point i such that 
Ti 2 T. Analogously, the latest time-point UTP (T) of T > To on c is the largest i such that 
Ti € T. We omit the stream o (e.g., =, ETP (7) and LTP (r)) if itis clear from the context. 


2 Proof System 


We introduce a local proof system for MFOTL (Figure[2p. "Local" means here that the 
proof system does not talk about satisfiability in general, but rather about the formula's 
satisfaction or violation for a fixed stream, assignment, and time-point. 

Our proof system consists of two mutually dependent judgments, HE and +> (again 
c is omitted when clear), that characterize a formula's satisfaction v,i H œ and violation 
v,i l- a relations for assignment v, stream o-, and time-point i. The rules of our proof 
system closely follow the MFOTL semantics (Figure[1) and extend the proof system used 
by Lima et al. with assignments (that are mostly passed around without modification) 
and the rules for quantifiers (which modify the assignments). The rules for atomic 
predicates and Boolean constants and operators are self-explanatory: e.g., predicates 
are satisfied if a matching event is present in the trace; a conjunction is satisfied if both 
conjuncts are satisfied; a conjunction is violated if either of the conjuncts is violated. 

The rule 3* states that for v to satisfy 4x. « at i, it suffices to provide a domain value 
d such that the updated assignment v[x +> d] setting x to d satisfies a at i. Conversely, 
J` asserts that the violation of 3x. œ under v at i requires showing that all domain values 
make v|x ++ d] violate a at i. Since the universal quantifier is dual to the existential one, 
the rules Y~ and V* exchange the relations H} and H% compared to 3* and 3^ 
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pEr: | pEr: vikta viH B 


v,i HF tt at v,i H ff F v,il-* p(t) p v,i-- p(t) P viH o S 
v,iF a " v,i a ne v,i B As viF*« wi-*f n viH a fs 
v,iEt ~g viF- «AB © wiF-«^AB * — wiFteAf vittasp & 
viH" æ E v, iH æ v+ v,i-* B V via wiFt- B ~ v,i Ht B E 
v,il-- ng vi-*evB ^ wic*evB È | wi--avB vittasep * 
vxo dj,itt a - Ad.v[x e d],itt a oe Ad. v|x> d], i F7 « " vxed,iF^ v | 
v, iHt Ix. a v,i Ht Vx.a vit dx.« Hh vik Vx.a 
. i»0 vj—-rii«I |  i»0 rij-ri.j»1 |. i>0 vi-l a 
v,0F^ e;« 9 v, iF @ya q v, iH @ya ah v, iH  ej« 
Tipi- TiEI v,i +1" æ 7 Tik1—Ti«Íl |. Tim- TP > 1 ee v,itl Ne 
v,i Ht Ore v,i HF Ora SE vil Oya ae v,i F Ore 
j<i v-Tj€l v,jrte 4 oT ic j2i rj—-Tj€I MAT 
v,i -* ẹja v,i H7 ẹja st v,i -* Ora 
suzuki ATCO LOL viH a Aje ED MODE Sn BL 
v,i H7 ẹja v,i H Ora 
JLi vj-Tj€l vj «a |  rj«t94I € jzi ty t EL wjF eo — 
vi-- Mc v,il-* Ba ŝi v,il-* Ora 
n2ni NEED vj e g AEE ve 
v,i Ht Ba v,i H* Ore 
j&€i vj-Tj€l vjHtB Ake(ji.vktta , i»0 ti—ti €l vi-1F*a n 
v,i æ Sp s v,il-* ec ° 
i<j Ttj=tiEl wjEFtB Vk ED qa ti<to+I |  v(x)=c " 
v,i-* «ut B v,i a Sp V4 ywicbtxmc 
r;2T941 Ake[EP(D, UP (1)]. v,k H7 B ra Ak € [Ef a), LEWD]. v,k EO B ie 
v,i a Sp d vi-- aly B i 
EP) <j FSi r2tvo*l vja Ake fj?) v kH B 
v,i a SB 
i<j j<LÊ() wjF^e Vee [E), jDvk-- B. — — v(x)zc 
vi-- aly B u vik xzec i: 


Fig. 2: Local proof system for MFOTL on a fixed stream c = (ri, j) jen: 


The rules $* and Q* are mere restatements of the MFOTL semantics. Since the oper- 
ators W; and (1; are respectively dual to 4; and z, their violation rules ^^ and O7 once 
again exchange ++ and +> compared to @* and O+. The rule LM accounts for the vacu- 
ous truth of the operator Bil; near the start of the stream (when no time-points fall within 
the interval 7). Dually, the rule + asserts the violation of €, near the start of the stream. 
The remaining rules $7, 07, W+, and O* use notation EP (1), LP (I), Ef(Z), and LÍ(1) to 
refer to time-points of particular interest relative to the current time-point i. Specifically, 
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for a future formula o = Fra with F € {0,0,0 } and interval J = [a,b] or I = [a,b) such 
that b Z oo, the formula's semantics at time-point i may need to refer to any time-point 
with time-stamp in |r; 4- a, ..., v; +b]. The latest such time-point is L(I) = LTP(r; +b) 
while the earliest one is E!(7) = max(i, ETP (t; -- a)). For past operators P c {0, ¢, m}, 
the relevant time-stamp interval is [r; — b,...,7; — a] and the interval's earliest time-point 
is E? (I) = ETP(r; — b) and its latest time-point is L? (7) = min(i, LTP(r; — a)). 

Proof trees emerging from repeated application of the rules in our proof system 
contain all the necessary information to explain why a formula is satisfied or violated. In 
other words, our proof system is sound and complete, i.e., the following result holds. 


Theorem 1. Let œ be a formula, v a variable assignment, i € N a time-point, and 
o = (ri, lijen a trace. Then v,i H$ a+ v,i Fo a and v,i F3 at v,i ko a. 


We have formalized and verified this result in Isabelle/HOL. 


Example I. Consider the standard publish-approve example requiring that any file 
f published by an author a, must first be approved by a manager m of a within the 
previous seven days. The formalization of this policy as a closed MFOTL formula is: 


q = Va. V f. publish(a, f) > (910,53. (^mgrr (m,a) S mgrs(m,a)) ^ approve(m, f)) . 


Here, the events mgrs(m,a) and mgre(m,a) mark m starting and finishing being a's 
manager. Formally, m is currently a manager of a if m started being a's manager in the 
past and has not finished being a's manager since. Thus, the manager relation changes 
over time. Consider the stream (7j, ;);-«, where To = T1 = 0, T2 = 4, T3 = 10, and 


Io = {mgrs (Mallory, Alice), mgrs(Merlin, Bob), mgrs(Merlin,Charlie)}, and 

Tı = {approve(Mallory,152)}, and 

I? = (approve(Merlin, 163), publish(Alice, 160), mgrf(Merlin,Charlie)}, and 

T3 = (approve(Merlin, 187), publish(Bob, 163), publish(Alice, 163), 

publish(Charlie, 163), publish(Charlie, 152)}. 
In the following we abbreviate the subformulas of y as follows: pz, = publish(a, f), v1 = 
^mgrr (m,a) S mgrs(m,a), p2 = approve(m, f), p3 = dm. p1 ^q», Pr = 91793. and 
gy’ = qr — or. The following proof tree shows that g is violated at time-point 3 for any v: 
approve(d, 152) ¢ T; 
vila Charlie, f — 152,m  d],i F7 q» 3 


vla > Charlie, f — 152,m di gi Aga Gj 
publish(Charlie,152) € I5 " via Charlie, f — 152]i -^ y3 - 
via Charlie, f > 152],3 -* or, P v|a > Charlie, f — 152],3 F7 gg 
v|a — Charlie, f — 152],3 F7 qr — YR 
viat+ Charlie], 3 Vf. — 
v,3 FT Va. Vf. q/ 


Given gg's temporal constraint, we note that 73 > 0 and need to check v,i ^^ v3 for the 
time-points i € {2,3} (as [EE([0, 7]), L5 ([0, 7])] = {2,3}). Both subproofs are identical, 
so we parameterize them over i. In addition, the 3^ subproofs are valid for an arbitrary 
manager d € D (abbreviating infinite branching over all possible domain values). 
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sp = tt* (N) | p'(N,E,?) | ^* (op) | At (sp.sp) | V7 (sp) | V$ (sp) | 27 (vp) | >$ (sp) 
| V*(V,t)p(sp)) | 3*(V.D.sp) | @* (sp) | OF (sp) | *(N, sp) | O(N, sp) 
| BI, (N) | B (N, sp) | 0+ (N.sp) | S* (sp.sp) | UT (sp.sp) , 
op =f (N) | p (N.E.t) | ^ (sp) | Ag (op) | Az (vp) | V (op, op) | ^ (sp, vp) 
| V (V.D, vp) | 3 (V.Up(vp)) | € (op) | @z,(N) | ©- (N) | 65; |O (vp) | OZ (N) 
NE (N,op) | $2, (N) | O- (N, op) | M- (N, op) | 07 (N, vp) 


SZN) | S~ (N,op,op) | SC (N, op) |U (N,op,vp) | Ua (N, op) 


Fig. 3: Grammar for our proof objects. 


3 Proof Object Checker 


This section introduces our proof objects and their checker: finite data-representations of 
our proof system's trees, and an algorithm that certifies if a given proof object faithfully 
proves the satisfaction or violation of a formula under a given assignment and stream. 
We discuss the soundness, completeness, and executability of these constructions. 

To algorithmically manipulate proof trees, we define an explicit representation of 
satisfactions sp and violations op via the grammar in Figure B} where each constructor 
corresponds to a proof rule of our proof system (Figure p}. and its arguments represent 
subproofs and parameters that are part of a rule. The disjoint union p = sp vp is our 
type of proof objects. The proof object V^ requires information about satisfactions for 
all domain elements d € D which we finitely represent with our valued partitions P € 
(Jp (sp). Recall that a partition P of a set A is a collection of non-empty, pair-wise disjoint 
subsets of A that cover A. That is, Di'1D; = Ø for Di, Dj € P with D; 4 Dj and UP =A. 
Partitions enable us to finitely represent all elements of the domain using finitely many 
finite sets and the co-finite complement of their union. In valued partitions P € (Jp (sp), 
each set in the partition is tagged with a satisfaction explaining why its elements satisfy 
the argument of a universally quantified formula. Formally, our valued partitions P € 
(Jp (Z) are lists of pairs of a set D; and a value z € Z from a given set Z such that the sets 
Dj form a partition of D. Similarly, 3^ stores a valued partition P € lp (vp) of violations. 

Our proof objects p € p represent satisfactions or violations at a certain time-point. 
We define a function tp(p) (omitted) to compute this time-point. Either this information 
can be obtained recursively (e.g., tp(O* (p)) = tp(p) — 1) or, in cases where it cannot, it 
is stored directly in the proof objects (e.g., tp(tt* (i)) = i). We lift tp to sequences (yield- 
ing sequences of time-points) and valued partitions as tp(P) = tp(p1), where (D1, pi) 
is the partition P's first entry. To characterize valid proof objects, we define the relation 
Fo ( Figure [4p that checks that proof objects constitute correct applications of our proof 
system's rules. Here, + is not an executable algorithm yet since the proof objects V^ and 
J~ require a recursive call for each element of each set in the partition, and at least one of 
such sets is infinite for infinite domains. We will improve on this aspect after an example. 


Example 2. The following violation proof object p at time-point 3 (i.e., tp(p) = 3) is 
valid for formula y on stream o from Example [1] (.e-. v, p Fo q for any assignment v): 
p =V (a,Charlie,V (f,152, pZ, )), where 
p^ — — (PtP) PE =p (3. publish, (a, f]), 
Py = 9 (3, [3 (x, ÉD. p5)]).3- (x, ((D. p3 )])]). and 
P; = ^g(p (i, approve, |m, f])) for i € {2,3}. 
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v, tt^ (i) F tt vt (i) - ff 
vp" (i, p,t) - p(Z) iff p (ftl) € Ti vp (i p.t) E p(t) iff p(l) € T; 
vg (i, x,c) xxe iff v(x) 2c v,a (lx,c) xc iff v(x) Zc 
v, nt (vp) - 2a iff v,vypF a v, = (sp) - ~e iff v,spF a 
v, (vp) a — B iff v,vpF a vA^p(vp)-e^B iff vvpF a 
Vp (sp) a B iff v,spl- B v,Ag(vp)- «^B iff v,vpl- B 
wd*(xd,sp)-dx.e  iffv[xe»d,sp-«  wVvy(sp- «vB iff v,spH æ 
vV- (xd,vp)-Vx.a iff vx d,vp- æ | v. Vg(sp- «VB iff v,sp- B 
v,A*(spi,spo) - e ^B iff v,sp, F a and v,sp; + B and tp(sp,) = tp(sp2) 
v, V-(vpy,vpg) FaVB iff v,vp, F a and v,vp5 F- B and tp(vp,) = tp(vp2) 
v, —> (spi,vp3) F- a > B iff v,sp, F a and v,vp; + B and tp(sp,) = tp(vp2) 
v,V* (x, P) F Vx. œ iff A(Diy, sp) € P. tp(sp,) = tp(P)and Ad € Dy. v[x > d], spy F œ 
v,d (x, P) H dx. a iff A(Dz, vpr) € P. tp(vp;,) = tp(P) and Ad € Dy.v[x > d], vp; F œ 
v,@ (sp) ej« iff v,sp + @ and tp(@* (sp)) = tp(sp) + 1 and Tio(6)* (sp)) — Ttp(sp) € T 
v,O* (sp) F Ore iff v,sp F œ and tp(O * (sp)) + 1 = tp(sp) and Ttp(sp) ^ Ttp(O* (sp)) € Z 
v, 9* (i, sp) F ẹja iff v,sp - œ and i > tp(sp) and Ti — Ttp(sp) € I 
v, Ot (ísp)- Ora iff v,sp + @ and i € tp(sp) and Tip(sp) — ti € I 
v,W^ (i,sp) + Biya iff (Asp € Sp. v,sp - a) and tp(sp) = [EP (7), L? I)| and t; > to 41 
v, L1* (i, sp) + Ora iff (Asp € sp. v,sp - a) and tp(sp) = [Efc),LEQ)] 
vS'(spsp)-aeSiB iff (Asp’ € sp.v,sp' - a) and v,sp F B and tp(S* (sp.sp)) > tp(sp) 
and tp(sp) = [tp(sp) T 1,tp(S* (sp.sp))] and Ttp(S* (sp.sp sp) | Ttp(sp) el 
vU" (sp,sp) - alu; B — iff (A sp’ € sp. v,sp' - a) and v,sp H B and tp(U/* (sp, sp)) < tp(sp) 
and tp(sp) = [tp(Ut (sp.5p)).tp(sp)) and Ttp(sp) — Ttp( U* (sp.sp)) el 
v, e, | @ra ifftp(65) — 0 Vv, @_,(i) F ee iff i > 0andTj— Tj. | < I 
v,Oz() Oya iff t; — T; «I v,e.,(i)- @a iff i > 0 and Ti — Ti-1 >I 
v,O5,(4) - Oya iff T3444 — Tj» I vM) - Bo iff Ti « To -I 
v, $Z (i) F Ora iff T; < To +1 v, Sz (i) F a Sr B iff Tt; < T9 +1 
v,e (vp)- eje iff v,vp + @ and tp(@~ (vp)) = tp(vp) +1 
v,O7 (vp) F Ore iff v, vp - œ and tp(O7 (vp)) + 1 = tp(vp) 
v, (i, 0p) - ẹja iff (A vp € vp. v, vp - a) and tp(vp) = [E? (1), D and T; >t) +/ 
v,07 (i, 0p) F Ora iff (A vp € vp. v,vp - a) and tp(vp) = [E D),LE()] 
v, E (i,vp)- Biya iff v, vp F @ and i > tp(vp) and Ti — Ttp(vp) € I 
v, LI (i, vp) + Ora iff v, vp F œ and i < tp(vp) and rgo — Ti € I 
vSa(Lvp)- aS; iff (A vp € vp. v, vp - B) and tp(vp) = [EP (1), LP (1)] and vj > to 4-1 
v,S- (i,vp,vp) + a Srp iff (Avp € vp. v vp + B) and v, vp H « and EP (1) < tp(vp) € i 
and tp(vp) = [tp(vp), LP (1)] and t; > To +7 
vU (i, vp) - aur; B iff (A vp € vp. v,vp - B) and tp(vp) = [Ef(/), Li(1)] 
v, U~ (üvp,vp) - aU; B. iff (Avp € vp. v,vp - B) and v,vp - œ and i < tp(vp) < LÊ(T) 
and tp(vp) = [E} (1), tp(vp)] 


Fig. 4: Proof checker for a fixed stream o = 


Indeed, we use the definition in Figure[4]to certify that v, p 
v, pt iff vila Charlie V^ 


| (f,152, p=) - Vf. pL > en 
iff v[a — Charlie, f — 152], p, F YL — YR 
iff vla — Charlie, f — 152], p} 

v|a > Charlie, f — 152], py 
iff v[a — Charlie, f > 152], 37 


NL 


Ho g: 


(ti i) en 


- ez and tp(p; ) =3 = tp(p, ) and 


(x, [(D, p; )])  g3 for i € {2,3} 


iff vla 5 Charlie, f > 152,x dl, p; F e1^g» for alld € D, i € {2,3} 
iff approve(d, 152) ¢ T; for all d € D, i € {2,3}, which is true. 
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LRTP itt = LRTP iff =LRTP i (p(?)) 2 LRTP i (xc) =i, 
LRTP i (Qx. à) = LRTP i (~a) = LRTP ia for Q € (V,3], 
LRTPi(a@B) = max (LRTP ia) (LRTP i) for 6 € (V,A,—], 

LRTPi(@ja) —LRTP(i-1)a, LRTPi(O;a)=LRTP (i--1)a, 
LRTPi(O;v) —LRTP i (Hijo) = LRTP (Lí(1))a, 
( 
( 


LRTPi($;a) —LRTP i (Mya) =LRTP (LTPP™ Da, 

LRTP i (æ S; B) = max (LRTP ia) (LRTP (LTP? 1) 8), 

LRTP i (aU; B) = max (LRTP (Lf(1) — 1)a) (LRTP (Lf(1)) 8), where 
LTPP“ I = (if v; > To +/ then L? (I) else 0) . 


Fig. 5: The formula's latest relevant time-point at i for a fixed stream o = (T;, l'i) jey- 


We implicitly use in the above the true statements publish(Charlie, 152) € 73, 0 < T3, 
and tp([B7 (x, (D. p5)]).3- (x, [(D. p3 )D)) = 2.3] = [EP 0), L? QD]. Li 


Theorem 2. Fix a stream o. The relation + is sound and complete in the sense that 
v,i E a iff there is a satisfaction sp such that v,sp } «œ and tp(sp) = i. Similarly v,i ÉÉ œ 
iff there is a violation vp such that v,vp } a and tp(vp) =i. 


We have established the above result in Isabelle. Below we sketch our overall ap- 
proach and highlight the main challenge. We show both soundness and completeness 
by relating proof object validity (+) to the proof system (+* and H7), which we already 
know to be sound and complete, i.e., related to the semantics F. Soundness is easy as the 
proof object directly provides the recipe for correctly applying the proof system rules. 
Formally, if v, sp + æ then v,tp(sp) H" æ, and if v,vp H a, then v,tp(vp) -^ a. The proof 
follows immediately by mutual induction on the proof object structure. 

Completeness of | requires us to provide a valid proof object just from knowing 
v,i -* æ orv,il-^ a. We proceed by mutual induction on the derivations of -* and H7. 
Only two of the quantifier cases are challenging. For the satisfaction of the universal 
quantifier (and similarly for the violation of 3), we must construct a valued partition 
with finitely many subproofs. However, the induction hypothesis yields a separate proof 
object for every element of the domain D, and all these proof objects may a priori be 
different. The crucial observation is that for all values that do not occur in the stream 
(or at least are not in reach of œ with respect to a time-point i) we can reuse the same 
proof object. To formalize this observation, we first define a formula's active domain at i, 
written AD;(a), which formalizes the in “reach” intuition. To this end, we first define the 
latest relevant time point (LRTP i œ) of eat i (Figure. Intuitively, LRTP i œ marks the 
largest time-point that may influence o's satisfiability at i. It exists, because we assume 
that future temporal operators have bounded intervals. Based on this, we define: 


ADj(a) = D(a)U BN a | d appears in some p(di,...,d,) € Tk}. 


Here we write D(a) for the set of constants d € D occurring in subformulas of the form 
x7 dina. (In contrast to constants occurring in atomic predicates, constants occurring in 
equalities may appear in @’s satisfying assignments even if they are not part of the trace.) 
Note that AD;(a) is finite. The active domain lets us formalize the key observation: 


Lemma 1. Fix a stream c, a formula a, a proof p, and two assignments v and v'. Let 
i = tp(p), AD = AD;(a), and V be the set of a's free variables. Assume that v and v 
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may only disagree on V for values outside of the active domain at i, i.e., 
Vx € V. v(x) = v' (x) v (v(x) € ADAv'(x) € AD). 
Then, p’s validity status is the same for both assignments, i.e., v,p - a iff v ,p - a. 


We now can finish the V^ case of the completeness proof. By the induction hy- 
pothesis, there is a satisfaction p(d) € sp for each domain element d € ID. Moreover, 
{{d} | d € AD;(a@)} U (DV AD;(a)) is a finite partition of ID. Hence, the list of pairings 
({d}, p(d)) for each d € AD;(a) and (D \ AD,(q@), p(z)) for some z € D \ AD;(a) (which 
exists as ID is infinite) is a valued partition. Moreover, all subproofs are valid for all the 
values contained in the partition sets by combining the induction hypothesis with the 
above congruence Lemma[I|(for p = p(z)), and thus so is the overall V* proof object. 

Lastly, we address the executability issue. The validity relation works with assign- 
ments v of values to variables. To avoid performing infinitely many recursive calls for the 
V* and J^ proof objects we now will work with set assignments V of sets of values to 
variables. We define a validity relation V, p + œ based on set assignments. The definition 
is the same as the one of v, p + œ except for the predicate and the quantifier cases: 


V p (ip, t) pE) iff {p} x [lv STi 

Vp (ipt) pt) iff {p} x [Jv € DAT; 

VVv*(x,P)-Vx.m iff A(Disp,) € P.tp(sp,)—tp(P) and V[x Di], sp, F « 
V, 3+ (x,d,sp) - 3x. a iff V|x 5 {d}], sp H « 


and dually for 3^ and V~. Here, [f]y represents a transformation of the list of values 
[t], to the set of all possible lists of values generated by V. Set assignments allow us 
to delay deciding values for quantifier subproofs to the predicate base case. Note that 
{p} x [flv CI; and {p} x [fy C D\ J; are decidable because due to our partitions, we 
only encounter finite and co-finite sets. The set-assignment-based validity check is thus 
executable and thus provides the algorithm that we use as our formally verified proof 
object checker: v, p - à = (Ax.{v(x)}), p F æ (proved by induction on a using Lemma[I). 


4 Partitioned Decision Trees 


Our proof system is parameterized with an assignment, but in our monitoring approach 
we are interested in computing a proof object for every assignment. In this section, we 
introduce partitioned decision trees (PDTs), a specialized data structure for representing 
and efficiently manipulating variable assignments, inspired by the use of BDDs in run- 
time verification [17]. We want to represent functions of the form f : D x...x D — p, 
i.e., mappings from tuples of domain elements to proof trees, where each tuple corre- 
sponds to a variable assignment to the formula's free variables. As argued in the previous 
section, we are only interested in such functions with a finite range. Thus, we organize 
the domain into a finite number of subsets D x ... x D such that each tuple element is 
partitioned separately (using valued partitions over the domain). As before, we work with 
finite and co-finite sets in the partition. PDTs P(A) are defined inductively as follows: 


P(A) = Leaf A | Node (V, LJ, (P(A))) 
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{Alice} IDA {Alice, Bob, Charlie} 


{Bob} {Charlie} 


es o testen O D {163} m REND @ 
[4] 


Fig. 6: Resulting PDT for our running example at time-point 3. 


PDTs have leaves and nodes. Leaves store objects from the set A, while nodes store pairs 
of the form (x, P), where x is a variable and P, a valued partition of the domain storing 
PDTs. PDTs generalize binary decision trees along two dimensions. First, the branching 
of their nodes is not binary but follows a given partition of the infinite domain D. Second, 
their leaves do not store Boolean values. Instead, they store arbitrary objects, even though 
we will mostly use them with proof objects A = p. PDTs provide a way to organize 
the infinitely many possible variable assignments in a structured manner, storing only 
finitely many different proof objects. In monitoring, partitions will arise naturally, guided 
by the values occurring in the stream and assembled via operations that combine them. 


Example 3. We continue the publish-approve example from publish (a.f)£T3 —.— 
Example[1] ||| We consider the same stream but drop the Ut LAE publish (a, f) p 
level quantifiers from the formula q: we only consider ¢’ Fy 21 
with its free variables a and f. Figure [6] shows the PDT . px 

representing all assignments for y’ at time-point 3. The root Fis. 7: Proof tree A. 
node represents variable a, and the edges partition the values that a can take into the fol- 
lowing domain subsets: {Alice}, [Bob], {Charlie}, and D \ {Alice,Bob,Charlie}. 
The second level is analogous for variable f. At every level of the PDT, the union of all 
choices cover the entire domain ID (by definition of partitions) and the partitions may dif- 
fer at every node. The leaves of the PDT are different proof trees (formally, proof Bb 
which we represent by small black triangles. For example, 43 is the proof tree of y’’s 
violation shown in Example[I] 1| In contrast, A * (occurring in multiple leaves) is the proof 
tree shown in Figure[7]of ¢’ y’’s vacuous satisfaction: the left hand side of the implication 
(publish(a, f)) is violated for any assignment v updated by following the path from the 
PDT's root to the respective leaf (e.g., taking a = Alice and f = 42€ D {163}). 


Since PDTs are a generalization of BDDs, we use similar functions to manipulate 
them. We list the most important ones, for partitions and PDTs in Figure[8] but we only 
show and discuss the implementation of apply2, merge2, and hide. Most PDT-functions 
are parameterized by a variable list vs :: V fixing the variable order. The functions 
map. part and apply1 lift unary functions on objects to partitions and PDTs respectively. 

The functions merge2 and apply2 do the same for binary functions; apply2 gener- 
alizes the well-known apply function on BDDs [16]. On leaves, apply2 maps f to the 
objects. When operating on a leaf and a node, apply2 pushes f partially applied to the 
leaf to the node's leaves using apply1. Finally, on pairs of nodes, it proceeds recursively 
depending which of x, y, and z are equal. The most interesting case, x — y — z occurs 
when both PDTs partition the domain values for z in different ways. Thus, we must 
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map. part :: (A > B) > Wp (A) > Wp(B) 
merge2 :: (A > B — C) = Wp (A) > Wp(B) > Up(C) 
merge? f [] P? = |] 
merge2 f ((Di, vj) # P1) P2 = 

let P3 = map filter (A(Do, v2). if Dj N D? 4 Ø then Some (Di N D2, f vı v2) else None) P»; 

P4 = map filter (A(D5, v2). if Dz \ Dı z Ø then Some (D3 \ Dj, v2) else None) P; 

in P5 (9 merge2 f P; Pz 

pdt of :: V > A = A = 2(-7D) = P(A) split prod :: P(A x B) = P(A) x P(B) 


applyl:: V = (A => B) > P(A) = P(B) split list :: P(A) = P(A) 
apply2 :: V > (A > B > C) > P(A) = P(B) => P(C) 
apply2 vs f (Leaf l7) (Leaf l2) = Leaf (f l; l2) 
apply2 vs f (Leaf l7) (Node (x, P2)) = Node (x, map. part (apply1 vs (Al2. f lı l2)) P2) 
apply2 vs f (Node (x, P;)) (Leaf l2) = Node (x, map. part (apply1 vs (Al;. f lı l2) Pj) 
apply2 (z # vs) f (Node (x, P;)) (Node (y, P2)) = 
if x = z and y = z then Node (z, merge2 (apply2 vs f) P; P2) 
else if x = z then Node (x, map. part (Al. apply2 vs f | (Node (y, P2))) Pj) 
else if y = z then Node (y, map. part (Ar. apply2 vs f (Node (x, P7)) r) P2) 
else apply2 vs f (Node (x, P7)) (Node (y, P2)) 
apply3 :: Y > (A > B > C > D) > P(A) 2 P(B) => P(C) 2 IP(D) 
hide :: V > (A — A) > (Wp (4) > A) > P(A) = P(A) 
hide vs leaf node (Leaf I) = Leaf (leaf I) 
hide [z] leaf node (Node (x, P)) = Leaf (node (map. part unleaf P)) 
hide (z # vs) leaf node (Node (x, P)) = 
if x = z then Node (z, map. part (hide vs leaf node) P) else hide vs leaf node (Node (x, P)) 


Fig. 8: Selected functions on partitions and PDTs. 


combine both partitions. For this, we use merge2 that takes two valued partitions P, and 
P», and iteratively “erodes” P» by intersecting its elements with the sets in P, while 
applying f. Since both P, and P5 cover D, the resulting set of intersections is a valued 
partition. The function apply3 analogously combines three PDTs into one. 

The function hide traverses the PDT similarly to apply1, while eliminating the last 
variable in the given variable list. It uses two higher-order arguments, in case the last 
layer is present (node) or absent (leaf). The function pdt of vs A B V constructs a PDT 
from a finite set of partial assignments (V :: 21-009 using A for leaves reached by paths 
from the set, and B for the other leaves. Finally, the split * functions transpose a PDT 
storing pairs (lists of equal length) into a pair (list) of PDTs. 


5 Monitoring Algorithm 


We follow the typical online monitoring algorithm structure consisting of an initialization 
and a step (evaluation) function (25]B0]. The initializer init (omitted as standard) com- 
putes our monitor's initial state s € S from an MFOTL specification a. Figure[9|shows 
an excerpt of our monitor’s state, which recursively follows the formula structure and 
augments some operators with additional information, such as buffers storing verdicts 
from subformulas (B for ^ and B; for S) or an operator-specific state (Ssaux for S). 
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B2 = P(p) x P(p) B4 = P(p) x P(p) x Nx N Beaux is 
S = MPred E T | MAnd SS B; | MExists ES | MSince I S S Bs (P(Ssaux)) |- 
eval: V—N—N-oExD-AS-P(pxS 
eval vs ti l (MPred p ts) = 
let e = pdt. of (filter (Av. v € fv(ts)) vs) (pT (i, p.ts)) (p~(i, p.ts)) 
{o | dds. p(ds) € I ^ match ts ds = Some c) in ([e], MPred p ts) 
eval vs ti l (MAnd s; s2 buf) = let (esj,5,) — eval vs is; — (es2,55) = eval vs til s2; 
(es, buf") = buf2 take (apply2 vs do and) (buf2 add buf es, es?) in (es, MAnd s, s^ buf") 
eval vs t i I (MExists x s) = let (es,s’) = eval (vs [x]) tir s 
in (map (hide (vs [x]) (do exists leaf x) (do. exists node x)) es, MExists x s") 


eval vs t i I (MSince I s; s2 buf saux) = 
let (esj,5,) 2 evalvsriD sı; (es2,55) = eval vs T i so; 
(es, buf',saux ) = buf2t take (Ae, e» (v, i) saux. 
let (saux',es') = split prod (apply3 vs (update since T i) e1 e2 saux ) 
in (saux',split list es’)) (buf2t add buf es; es» [(t,i)]) saux 
in (es, MSince I s} s} buf’ saux’) 


Fig. 9: Involved types and selected cases of the monitor's eval function 


do exists leaf x p = if p € sp then 3* (x, d — D, p) else I` (x, [(D,p)]) 

do exists nodex P = if \/(D;, p) € P. p € sp 
then min (map filter (A(Dj, p). if p € sp then Some (3* (x,d + Dj,p))) else None) P) 
else 3 (x, P) 


Fig. 10: Functions do, exists leaf and do. exists node. 


Our function eval, partly shown in Figure D} takes as inputs a new time-point i (along 
with its time-stamp T and database T`) and a monitor state s and outputs the next state 
s^ and a list of PDTs of proof objects as verdicts. (In addition, eval keeps track of the 
variable ordering used in PDTs via the parameter vs.) Lists in the output are necessary 
because delays may occur for (bounded) future operators and a single time-point might 
trigger multiple outputs. Our algorithm extends Lima et al.’s algorithm computing 
proof trees for MTL. We highlight our key additions to eval and the state FigureD]in gray. 

We focus on the predicate, conjunction, existential quantifier, and since cases. In the 
predicate case, we find all partial assignments o mapping the predicate's variables to 
the values ds, so that p(ds) € I. We reuse VeriMon's match function to compute 
such partial assignments. We convert this set of assignments to a PDT using pdt of. 
In the resulting PDT, matching assignments lead to leaves using the satisfaction proof 
p^ (i, p.ts), whereas the others lead to the corresponding violation proof p (i, p, ts). 

The conjunction case is taken almost without changes from Lima et al.’s MTL 
algorithm. We reuse the buffering functions buf2 add and buf2 take. The first adds 
partial results to the buffer, while the second combines these results and dequeues them 
once both subformulas have produced results for a time-point. The only difference is that 
our buffers store PDTs of proof objects, whereas the MTL algorithm works with proposi- 
tional proof objects. Accordingly, we reuse the Lima et al.’s function do and ::p— p p 
to combine two proof objects conjunctively, but lift it to PDTs using apply2. 

The quantifier cases are a new addition of our work. As both cases proceed dually, 
we focus on Jx. formulas. Considering that a may have one more free variable than 
Jx. æ, the recursive call appends x to the variable list ordering. The recursive call’s output 
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is processed using our function hide to eliminate the quantified variable. The interesting 
cases occur near the leaves of a’s PDTs. If x is not present, hide will encounter a leaf, 
i.e., a proof object, and use the function do exists leaf (Figure[10) to perform a case 
distinction: satisfactions result in a satisfaction (sp) of 3x. with an arbitrary element 
d of the domain as the witness (we write x + X to denote an arbitrarily chosen element 
x of a non-empty set X); violations result in a violation (op) with the trivial partition. If 
x is present as the last decision node, then hide will use do exists node (Figure[10) to 
construct the proof object for 3x. œ. It performs a case distinction whether a satisfaction 
proof is contained in the partition of this last node. If it is, dx.@ is satisfied and we 
compute the smallest (in proof size) such satisfaction proof, taking as our witness an 
arbitrary element of the respective partition set. Otherwise, all leaves are violations and 
we obtain a violation proof of 3x.a. 

To reuse Lima et al.’s temporal operator evaluation, our state stores a PDT whose 
leaves are the auxiliary state of these algorithms (instead of proof objects). This allows us 
to keep the complex auxiliary state and its update unchanged. For example, we use apply3 
to lift Lima et als update since function to two PDTs storing proof objects for sub- 
formulas and a third one storing the auxiliary state. The resulting PDT has type P(S,4,, x 


p). which we transpose into the desired P(Ssaux) x P(p) using split prod and split. list. 


6 Implementation and Case Study 


We implement our algorithm in a new monitoring tool, called WHYMON [2]. Our im- 
plementation consists of 4500 lines of OCaml code and incorporates an optimization of 
collapsing partition sets with the same stored values both in proof objects and in PDTs. 
Our formally verified checker contributes additional 1700 lines of OCaml code gener- 
ated from our Isabelle formalization, which itself comprises 6400 lines of definitions 
and proofs. The checker's main function lifts the validity check of proof objects (F) to 
PDTs, i.e., check : trace — formula — pdt — bool, and is used to certify WHYMON's 
output. WHYMON includes a visualization [3] implemented in React that consists 
of 2400 lines of JavaScript and invokes a JavaScript version of our monitor, gener- 
ated by Js. of. ocaml [32]. Here, we consider the data race policy that captures 
possible concurrency issues in multithreaded programs on a stream prefix generated 
by Raszyk Section 4.3]. Furthermore, we consider Nokia's Data-collection Cam- 
paign [4]. which comes with a stream prefix of around 5 million time-points [1], for which 
we focus on the del-2-3 policy controlling data propagation between databases. We 
describe a violation for each scenario highlighting the advantages of our approach. 


Example 4. We first return to Example ]in our visualization tool, depicted in Figure[1 1] 
The table includes TP (time-points), TS (time-stamps), and Values columns. The follow- 
ing columns show the topmost operator of y’’s subformulas or its predicate names (and 
their variables). In the Values column, for each of the already evaluated time-points, there 
is an associated button enclosing a v/ (for satisfactions), or a X (for violations) or both. 
After clicking on this button, we are presented with a dropdown menu (as in Figure[12) 
that corresponds to a partition. The listed values are the (potentially multiple) variable 
assignments of the resulting PDT for that specific time-point. The formula y’ contains 
two free variables, a and f, and to single out a verdict we must select one value for 
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each. In particular, at time-point 3 we select a — Charlie and f — 152. Note that in the 
visualization we focus on readability and omit set parentheses. Moreover, Other denotes 
the complement of the listed values. After choosing the assignments, a Boolean verdict 
appears in the next column matching the topmost operator of y’, namely —. Clicking 
on this Boolean verdict reveals and highlights the Boolean verdicts associated with its 
justification. The subformulas' columns of the current inspection are also highlighted. 
In this case, the implication is violated because the left side is satisfied, while the $107 
subformula is violated. We can explore this verdict further: the violation is justified by 
those of its subformula at time-points 2 and 3 (the time-points inside the interval are 
also highlighted). For each time-point, there is another dropdown menu where we can 
select an assignment for m. Here, the only listed value is Any, which corresponds to ID. 
Thus, the existential quantifier is violated because the subformula approve(m, 152) is 
violated for all values that m can be assigned to (ID), and all justifications are identical. 


Data Race Detection Multithreaded programs are pervasive and hard to debug. In par- 
ticular, they are prone to data races, which occur when two threads access (read or write 
to) a shared address concurrently and at least one of these accesses is a write. Locking 
mechanisms that synchronize access to variables shared between threads are a plausible 
solution. We consider the following policy to detect data race potentials (18): 


Pdr = datarace(t, t,x) — Jl. (acqnrel(ti, x, I) Aacqnrel(t2,x,/)), with 
datarace(t),t2,x) = € (read (t1, x) V write (t1 x)) ^ &write (t5, x) , and 
acqnrel(t,x,/) = W ((read (rx) V write (t,x)) — (“rel (t, 1) S acq (t,/))) 


where the predicates read(f, x) and write(t,x) specify read and write operations per- 
formed by thread ft to shared address x, and acq(t,/) and rel(t,/) specify the acquisition 
and the release of lock / by thread t. Havelund et al. consider a closed formula 
variant of this policy as their tool, DEJAVU, only supports closed formulas. In contrast, 
WHYMON supports open formulas. We consider the stream prefix: 


((0, {acq(9,9)}), (1, {read(9,3)}), (2, {acq(13, 19) }), (3, {acq(15,3)}), 
(4, (acq(18, 15)}), (5, {read(13,5)}), (6, {write(15,4)}), (7, {write(15,3)}),...) 


At time-point 7, WHYMON outputs a PDT with non-trivial assignments. We focus on the 
single violation in this PDT, which corresponds to the assignment ({9}, {15}, {3}) for (ti, 
t2, x). This violation is shown in Figure[13] The topmost operator of par is an implication, 
and it is violated because the left side is satisfied (there was a data race), while the right 
side (the lock requirement) is violated. Specifically, the data race occurred because 
thread t; = 9 read address 3 at time-point 1, satisfying the $10.) subformula in the left 
conjunct, and thread t2 = 15 wrote to address 3 at the current time-point 7, satisfying the 
$0... subformula in the right conjunct. Moving to the right side of the implication, the 
violation of the existential indicates that its subformula is violated for every value of ID. 
In particular, the subsets of the domain {9} and {9}° are each associated with a different 
violation. Here, we focus on the violation where / — 9. The subformula is a conjunction, 
and to be violated it suffices that one of the conjuncts is violated. This violation stems 
from the violation of the right conjunct Wo... (note that t? = 15 is listed as the variable 
in the predicate columns). We omit the columns referring to the left conjunct, since all 
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entries are empty. Once again, the implication is violated because the left side is satisfied, 
i.e., thread t; = 15 wrote to address 3 at time-point 7, satisfying the disjunction, but 
Sjo,0) on the right side is violated, because thread 1? = 15 never acquired the lock / = 9. 


Data Propagation Nokia's Data-collection Campaign [4] used three databases db;, db» 
and db; in the collection of sensitive information from mobile phones of participants. We 
focus on the policy ga [12]. which controls the data propagation between databases db» 
and dba: if data is deleted from db», then it must be deleted from db3 within 1 minute. 


Qdel = delete (x, db», y, data) ^ data # [unknown] — 00,60] 3. v. delete (u, dba, v, data) 


where db», db3, and [unknown] are constants and delete(db, c, db, Pia, data) specifies 
the deletion of data from participant piqa from database db using database user dbyser. 
We used the REPLAYER tool to convert the stream prefix to WHYMON's format. 
We executed WHYMON's command line interface with the entire prefix and found two 
violations. The following experiments were conducted on a computer with an Apple M1 
Chip (8 cores) and 16GB of RAM. WHYMON took 17m51s to process the entire prefix. 
We also executed MONPOLY with a slightly modified yet equivalent policy (due to moni- 
torability restrictions), and its running time amounted to 1m10s. MONPOLY outperforms 
WHYMON, but we must acknowledge the different outputs both monitors produce. MON- 
POLY only outputs variable assignments, whereas WHYMON outputs entire PDTs con- 
taining all assignments and a justification of the verdict in the form of a proof tree for each. 
We extract 100 time-points containing both violations and focus on the violation at time- 
point 79 for the assignment ({189810327}, {user2}, ( [unknown] }) for (data,x.y), 
depicted in Figure[14] Time-stamps are converted to actual dates (by enabling the option) 
and we omit time-points that do not contain relevant events for the violation. Let 


T19 = {delete(user2, db2, [unknown] , 189810327), 
Igo = {delete(triggers,db3, [unknown] , [unknown] )}, 
Ig, = {delete(user2, db2, [unknown] ,189810328)}, and go = I'g3 = I'g4 = Ø. 


The implication is violated because the left side is satisfied (there was a deletion at the cur- 
rent time-point 79), but 10,59] is violated. Note that [0,60) was replaced with the equiva- 
lent interval [0,59]. For each time-point of [Ef ({0,59]),L5y([0,59])] = {79,..., 84}, the 
subformula is violated. Regardless of the values we assign to u and v (all violations are 
identical), the subformula delete (u, db3, v, 189810327) is violated. 


7 Conclusion 


We describe an approach for MFOTL monitoring with verdicts in the form of proof ob- 
jects for every free variable assignment. Such verdicts are useful for understandability and 
certification, which increases the monitor's trustworthiness. We implement our approach 
in the tool WHYMON along with an interactive visualization for these verdicts, which we 
invite the reader to explore (3]. As future work, we plan to provide support for equality 
between variables and to improve our monitor's performance by, e.g., stream slicing [29]. 


Data Availability Statement Our artifact [26] includes WHYMON’s source code at the 
artifact submission time together with instructions on how to set up WHYMON locally, 
extract our PDT checker, execute our examples, and replicate our case study. 
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Abstract. Satisfiability modulo theories (SMT) solvers are widely used 
to ensure the correctness of safety- and security-critical applications. 
Therefore, being able to trust a solver's results is crucial. One way to 
increase trust is to generate independently checkable proof certificates, 
which record the reasoning steps done by the solver. A key challenge 
with this approach is that it is difficult to efficiently and accurately pro- 
duce proofs for reasoning steps involving term rewriting rules. Previous 
work showed how a domain-specific language, RARE, can be used to cap- 
ture rewriting rules for the purposes of proof production. However, in 
that work, the RARE rules had to be trusted, as the correctness of the 
rules themselves was not checked by the proof checker. In this paper, 
we present IsaRARE, a tool that can automatically translate RARE rules 
into Isabelle/HOL lemmas. The soundness of the rules can then be veri- 
fied by proving the lemmas. Because an incorrect rule can put the entire 
soundness of a proof system in jeopardy, our solution closes an important 
gap in the trustworthiness of SMT proof certificates. The same tool also 
provides a necessary component for enabling full proof reconstruction of 
SMT proof certificates in Isabelle/HOL. We evaluate our approach by 
verifying an extensive set of rewrite rules used by the cvc5 SMT solver. 


1 Introduction 


Satisfiability modulo theories (SMT) [8] solvers provide the back-end reasoning 
power for many formal methods applications. These applications are often used 
to provide safety or security guarantees for critical systems [1, 15, 21, 23]. For 
such applications, an incorrect result from a solver could have catastrophic con- 
sequences. Thus, ensuring the correctness of a solver's results is crucial. However, 
industrial-strength SMT solvers are large and complex software systems which 
are under constant active development. As with any other large software project, 
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even when employing software engineering best practices, it is unrealistic to ex- 
pect that solvers do not contain implementation bugs that could, in the worst 
case, compromise the correctness of their answers. 


One solution is to formally verify the SMT' solver itself. Unfortunately, that 
would be a massive effort. It would likely require performance compromises [17] 
and impose a tremendous maintenance burden, as changes to solvers are frequent, 
and each change would require revisiting the verification. 


Fortunately, there is a less expensive alternative: we can independently check 
each result produced by a solver. This is generally easy when the result is “satis- 
fiable," at least for quantifier-free inputs. The solver can produce a model and we 
can check via evaluation that the input formula indeed holds in it. To have a sim- 
ilar ability to check a result of *unsatisfiable," solvers must be instrumented to 
produce proof certificates that can be independently verified by a separate proof 
checker. To maximize trustworthiness, the proof checker should be small, sim- 
ple, and, ideally, formally verified. Alternatively, the checker can be embedded 
in a highly trusted system such as a skeptical interactive theorem prover. The 
SMT community is increasingly embracing proof production, with it becoming 
a major focus in recent years [3, 4, 19, 29]. 


One of the main challenges faced by SMT proof production efforts is the 
extensive use of theory-specific term rewriting rules. There are hundreds of such 
rules in modern solvers, each of which must be justifiable using some proof 
rule. Nótzli et al. [28] introduced a methodology for producing proofs for term 
rewriting rules by using the RARE domain-specific language. In that work, rules 
are defined in RARE, imported by a solver, and then used to elaborate the solver's 
term rewriting proof steps into finer-grained proofs using the RARE rules. This 
approach has proved to be viable in the cvc5 SMT solver [2]. However, previous 
work did not address the correctness of the rules, i.e., it does not ensure that an 
incorrect RARE rule does not compromise the correctness of proof certificates. 


An incorrect rule can have severe consequences. First of all, it may affect the 
ability of the solver to produce a proof certificate at all: if the incorrect rule does 
not match what the solver code does, then the elaboration of the term rewriting 
proof steps with RARE may fail. More concerningly, if both the code and the 
proof rule are incorrect in the same way (perhaps because one was modeled 
after the other), then proof elaboration may succeed, but the proof certificate 
will be incorrect because it uses an invalid rule. This is especially problematic 
when using proof checkers that consider proof rules as trusted—that is, they only 
check whether rules are applied correctly and do not check the rules themselves. 


'There are two ways to fill this gap. One is to separately verify the proof rules; 
another is to use a more sophisticated proof checker, for example, one embedded 
in a skeptical interactive theorem prover, that will fail if an invalid rule is used. 
In this paper, we introduce IsaRARE, a new plugin for the Isabelle/HOL proof 
assistant [27] (abbreviated to just Isabelle going forward), which can do the for- 
mer and is a necessary step towards the latter. The plugin translates RARE rules 
into the language of Isabelle where they can then be formally proved as lemmas. 
Note that when using IsaRARE simply as a rewrite rule verifier, the translation 
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from RARE to Isabelle becomes another trusted component. We mitigate this by 
reusing extensively-tested infrastructure in Isabelle for the translation. 

To show the effectiveness of IsaRARE, we implemented a large number of 
new rules in RARE (beyond those in [28]) needed to elaborate term rewriting 
steps in proofs generated by the cvc5 SMT solver [2]. We show that IsaRARE 
can translate all of these rules into corresponding lemmas in Isabelle and can 
prove the majority of them automatically. In ongoing work, we are manually 
providing proofs for the rest, and have already proven most of them. 

Our long-term vision is to enable the full integration of Cvc5 and Isabelle via 
proof certificate reconstruction. Currently, Isabelle can send proof obligations to 
CVC5, but it is unable to automatically reconstruct Isabelle proofs from cvc5's 
proof certificates. Our goal is to enable Isabelle to reconstruct every step in these 
proof certificates. In order to reach this goal, it is essential to have rewrite lemmas 
available for reconstructing rewrite steps, as they appear in almost all proofs, and 
without dedicated support for discharging rewrite proof steps, reconstruction in 
Isabelle can fail [11, 31]. 

In summary, we make the following contributions: 


— we introduce IsaRARE, an Isabelle plugin for generating correctness lemmas 
for RARE rules; 

— we add several new features to RARE itself and implement 163 new rewrite 
rules in RARE, almost tripling the size of the rule database from [28]; 

— we evaluate IsaRARE, showing that it can translate all of the RARE rules 
into Isabelle lemmas and can prove the majority of them automatically. 


In the rest of the paper, after surveying related work, we give an overview 
of proof production and the interface to Isabelle (Section 2). Then, we present 
the RARE language and our extensions (Section 3). We next introduce IsaRARE 
and explain the challenges in transforming a RARE rule to an Isabelle lemma 
(Section 4). Finally, we present an evaluation of our approach (Section 5). 


1.1 Related Work 


Various attempts at proof production in SMT solvers have been implemented in 
the past [7, 13, 14, 22, 25], though these implementations typically either pro- 
duce proofs certificates that are too coarse-grained (that is, they do not provide 
enough information for efficient proof checking) or produce them only if critical 
components are disabled, making solving while producing proofs slow or incom- 
plete. Producing complete, independently-checkable proofs remains challenging. 

One major challenge is solved by the modular framework by Barbosa et al. [3]. 
It enables proof production during term rewriting and formula processing and 
has been implemented in the SMT solver veriT [13] using the Alethe proof format 
[32]. Hoenicke and Schindler [19] introduce an alternative approach, implemented 
in the solver SMTInterpol [14], which also allows proof production for term 
rewriting and formula processing. Both of these approaches assume that the 
set of rewrite rules that can be used in proofs is fixed. Their sets include rules 
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for rewriting over equality, rules for rewriting Boolean formulas, and rules for 
reasoning about arithmetic. Notably absent, however, are rules for string and bit- 
vector rewrites. In other work, Barbosa et al. [4] describe a general architecture 
where the only holes in the generated proof certificates are those from rewrite 
steps. One of their key ideas is to support lazy proof production via a post- 
processing proof reconstruction step. This capability is leveraged in the work by 
Nótzli et al. [28] to produce proofs for rewrite steps based on rules written in 
RARE, which is the starting point for this work. 

The interactive theorem prover Isabelle [30] includes a popular tool called 
Sledgehammer [9], which encodes proof obligations as SMT problems and uses 
SMT solvers to solve them. Sledgehammer currently supports proof reconstruc- 
tion |12, 18] for two SMT solvers: Z3 [26] and veriT [13]. However, Z3 provides 
only coarse-grained proofs, which can cause reconstruction to fail. T'his issue 
was addressed for veriT by manually translating and proving correct in Isabelle 
the predefined set of rewrite rules in Alethe [18, 31]. Our work improves on this 
effort by providing an automatic mechanism for translating an extendable set of 
rewrite rules into Isabelle and includes support for bit-vector and string rewrites 
unsupported by veriT. 


2 Preliminaries 


2.1 Satisfiability Modulo Theories (SMT) 


The underlying logic of SMT is many-sorted first-order logic with equality (see 
e.g., [16]). A signature X consists of a set X* C S of sort symbols and a set Xf 
of sorted function symbols with sorts from X*. We assume the usual definitions 
of well-sorted terms, literals, and formulas. We also use the usual definition of 
interpretations and of a satisfiability relation = between X-interpretations and 
AX-formulas. A X-theory T is a non-empty class of X-interpretations closed under 
variable reassignment. A X-formula y is T-satisfiable (resp., T-unsatisfiable, T- 
valid) if it is satisfied by some (resp., no, all) interpretation(s) in T. For the rest 
of the paper, we assume (un)satisfiability is always with respect to some given 
background theory T. 


2.2 SMT Proofs and Rewriting 


A proof (of unsatisfiability) is a series of inference steps starting from an input 
formula and terminating with L, showing that the input formula is unsatisfiable. 
The granularity of a proof step refers to how much reasoning it requires and 
roughly corresponds to the complexity of checking that the step is correct. In 
particular, steps (and thus the proofs containing them) are fine-grained if they 
can be efficiently checked, and coarse-grained otherwise. We will often refer to 
coarse-grained steps as holes. 

One approach for the efficient production of proofs is to introduce coarse- 
grained proof steps for certain performance-critical deductions made while solv- 
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ing and then go back and fill in these holes with fine-grained steps as a post- 
processing step. We refer to this as proof elaboration, and it is particularly appeal- 
ing for rewriting steps, since SMT solvers have hundreds of different rewrites to 
simplify and normalize terms, and instrumenting the rewriting code to produce 
fine-grained proofs is difficult and may introduce an unacceptable degradation 
in performance. 

The approach taken by Nótzli et al. [28], and the one we also follow in this 
paper, is to assume that the SMT solver uses generic proof steps for all rewrites 
during solving and then elaborates these steps during post-processing by consult- 
ing a database of specific rewrite rules. The database is constructed by defining 
a set of rewrite rules in the domain-specific language RARE, which we discuss in 
Section 3. The elaboration tries to find one or more rules from the database to 
justify each generic, coarse-grained rewrite step. Additionally, it uses a built-in 
evaluate rule to justify steps that hold purely via constant folding. If elabora- 
tion is successful, the generic step is replaced by the fine-grained steps from the 
database. 


2.3 SMT in Isabelle 


As mentioned above, Sledgehammer [9] is an Isabelle tactic that applies auto- 
mated reasoning tools, including SMT solvers, to prove goals in Isabelle. When 
targeting an SMT solver, the goal is encoded as an SMT-LIB [5] problem which 
is unsatisfiable iff the goal is valid. Sledgehammer also selects facts that it thinks 
will be relevant for solving the goal and includes encodings of them as well. The 
problem is given to the solver which reports back to Sledgehammer whether it 
was able to prove the goal [9]. Proving the goal externally, however, is not enough 
since Isabelle is a skeptical proof assistant, in the sense that it does not trust 
external solvers. Thus, a proof of the goal must somehow be constructed and 
checked inside Isabelle. 

Finding such a proof internally can be challenging. One useful technique is to 
query the external solver for an unsat core, i.e., a subset of the facts it was given 
that are sufficient to prove the goal valid. Sometimes, this information is enough 
for Isabelle to search for an internal proof on its own. However, this process can 
be greatly improved, if, instead of just communicating the result and the core 
back to Sledgehammer, the solver also communicates a fine-grained proof. Then, 
with the appropriate proof reconstruction machinery, each step in the proof can 
be reconstructed as one or more steps using Isabelle’s internal inference engine. 
As mentioned in Section 1.1, Sledgehammer can do this for proofs from the veriT 
and Z3 solvers, though the former supports only a limited set of theories, and 
the latter produces only coarse-grained proofs. 

Still, this means that Isabelle already has an integration with solvers sup- 
porting the SMT-LIB standard and is able to translate to and from SMT-LIB 
and internal terms. We build on this integration and extend it. Notice that such 
an integration requires each SMT-LIB operator to be matched with a term in 
Isabelle with the same semantics. Isabelle has built-in operators that match well 
with those in the uninterpreted function and arithmetic SMT theories, and both 
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formalisms support quantifiers [18]. However, Isabelle only has partial support 
for bit-vector operators. A more complete development of bit-vectors in Isabelle 
is described by Bóhme et al. [11], but unfortunately, parts of their work (in- 
cluding parsing bit-vector proofs) never made it into Isabelle and now appear to 
be lost. As we describe below, part of our effort includes improving support for 
SMT theories in Isabelle, including bit-vectors and strings. 


2.4 Approximate Sorts 


RARE rules are meant to be easy and effortless to write. This is not the case when 
users have to specify sort information that is either inferable from the context or 
too restrictive. As an example of the latter, consider any rewrite rules involving 
bit-vector sorts. The SMT-LIB standard provides bit-vectors sorts that are pa- 
rameterized by their size, or bit-width. However, to keep sort checking simple, it 
requires all bit-widths in SMT-LIB scripts to be concrete as, for instance, in (_ 
BitVec 8). A similar argument applies to polymorphic sorts because, although 
SMT-LIB allows the definition of theories with such sorts (such as, for instance, 
array, set, and sequence sorts), it restricts scripts to monomorphic instantiations 
of polymorphic sorts — e.g., (Set Int). 

Unfortunately, these restrictions are too strong for RARE. They make it im- 
possible, for example, to write any rewrite rule involving bit-vector terms that 
is naturally parametric in the bit-width of those terms, or any rule involving 
terms with a polymorphic sort. The ideal solution would then be to introduce 
dependent types (or sorts, to maintain the SMT-LIB terminology) in RARE, 
allowing both value and type parameters in sorts — e.g., (_ BitVec n) with n 
an integer variable, and (Array A B) with A and B type variables. However, this 
would make it difficult for SMT solvers, Cvc5 included, to process RARE rules 
since, effectively, they only support non-dependent, monomorphic sorts. 

RARE’s compromise solution is to add instead approximate sorts to the sort 
system, following an approach analogous to gradual typing in programming lan- 
guages [33], a hybrid type-checking discipline where some program types are 
checked statically and others are checked dynamically. In our case, where there 
is no notion of dynamic checking, we have instead two sort-checking phases in 
the SMT solver for RARE rules: (i) as the rules are read by the solver, when sort 
checking is done with respect to the declared approximate sorts, and (iz) during 
proof elaboration, when the approximate sorts in the RARE rules are matched 
against the exact sorts in the proof steps that correspond to those rules. 

Approximate sorts are obtained by extending the sort system of SMT-LIB 
with a distinguished unknown value and a distinguished unknown sort, both 
denoted by ?, that can be used in place of a value or parameter in a sort. This 
allows the construction of approximate sorts such as (_ BitVec 7), (Set ?), and 
(Array ? ?) (abbreviated as ?BitVec, ?Set, and ?Array), while still allowing pre- 
cise sorts such as (_ BitVec 1), (Set Real), and (Array Int Real). Approximate 
sorts can be used to approximate dependently-sorted/polymorphic rewrite rules, 
as we see in the next section. 
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(rule) ^ :— ( define-rule (symbol) ( (par)* ) [(defs)| (expr) (expr) ) 
| Cdefine-rule* (symbol) ( (par)* ) [(defs)| (expr) (expr) [(expr)] ) 
| (define-cond-rule (symbol) ( (par)* ) [(defs)] (expr) (expr) (expr) ) 
(par) = (symbol) (sort) |:1ist] 
(sort) = (symbol) | ? | ?(symbol) | € (symbol) (numeral)* ) 
(expr) = (const) | (id) | € (id) (expr)*) 
(id) = (symbol) | ( (symbol) (numeral)* ) 
(binding) = ( (symbol) (expr) ) 
(defs) == (def (binding)* ) 


Fig. 1: Overview of the grammar of RARE. 


An additional advantage of this approach is that, by relieving the RARE user 
from the burden of specifying the precise sort of variables in rewrite rules, it 
makes them both easier to write and less error-prone. At the same time, the 
loss of precision introduced by approximate sorts is not a serious hindrance in 
practice: both the SMT solver, which relies on RARE rules for proof elaboration, 
and IsaRARE, which uses them during proof reconstruction, are able to infer the 
exact sort represented by an approximate one thanks to their knowledge of the 
(exact) sort of the constant and function symbols in the supported SMT theories. 
Subsection 4.3 explains how IsaRARE recovers exact sorts by type inference fully 
automatically during the translation to Isabelle. 


3 The RARE Language 


The RARE language? was introduced by Nótzli et al. [28]. As part of this work, 
we have extended the language to be able to represent more rewrite rules. We 
present the full updated language here and summarize the differences with [28] 
at the end of the section. 

A RARE file contains a list of rules whose syntax is defined by the grammar in 
Figure 1. Expressions use SMT-LIB syntax with a few exceptions. These include 
the use of approximate sorts for parameterized sorts (e.g., arrays and bit-vectors) 
and the addition of a few extra operators (e.g., bvsize, described below). RARE 
uses SMT-LIB 3 syntax [6], which is very close to SMT-LIB 2 and mostly differs 
from its predecessor in that it uses higher-order functions for indexed operators. 

We say that an expression e matches a match expression m if there is some 
matching substitution o that replaces each variable in m by a term of the same 
sort to obtain e (i.e., mo is syntactically identical to e). For example, the expres- 
sion (or (bvugt x1 x2) (= x2 x3)), with variables x1, x2, x3, all of sort ?BitVec, 
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matches (or (bvugt a b) (= b a)) but not (or (bvugt a b) (= c a)), with a, 
b, and c bit-vector constant symbols of the same bit-width. 


RARE Rules A RARE rewrite rule is defined with the define-rule command 
which starts with a parameter list containing variables with their sorts. These 
variables are used for matching as explained below. After an optional definition 
list (see below), there follow two expressions that form the main body of the rule: 
the match expression and the target expression. The semantics of a rule with 
match expression m and target expression f is that any expression e matching 
m under some sort-preserving matching substitution c can be replaced by to. 
With approximate sorts, the sort preservation requirement is relaxed as follows. 
In RARE, for any sort constructor S of arity n > 0, there is a corresponding 
approximate sort (S ? --- ?) with n occurrences of ? which is always abbreviated 
as *S. A variable x with sort ?S (e.g., 7BitVec) in a match expression matches 
all expressions whose sort is constructed with S (e.g., (BitVec 1), (BitVec 2), 
and so on). Variables with sort ? match expressions of any sort. 

An optional definition list may appear in a RARE rule immediately after 
the parameter list. It starts with the keyword def and provides a list of local 
variables and their definitions, allowing the rewrite rule to be expressed more 
succinctly. A rule with a definition list is equivalent to the same rule without it, 
where each variable in the definition list has been replaced by its corresponding 
expression in the body of the rule. For a rule to be well-formed, all variables in 
the match and target expressions must appear either in the parameter list or the 
definition list. Similarly, each variable in the parameter list must appear in the 
match expression (while this second requirement could be relaxed, it is useful 
for catching mistakes). Consider the following example. 


(define-rule bv-sign-extend-eliminate ((x ?BitVec) (n Int)) 
(def (s (bvsize x))) 
(sign extend n x) (concat (repeat n (extract (- s 1) (- s 1) x)) xD) 


In this rule, there are two parameters, x and n. The sort annotation ?BitVec 
indicates that x is a bit-vector without specifying its bit-width. The latter is 
stored in the local variable s using the bvsize operator. The rule says that a 
(sign extend n x) expression can be replaced by repeating n times the most 
significant bit of x and then prepending this to x. 

The define-cond-rule command is similar to define-rule except that it has 
an additional expression, the condition, immediately after the parameter and 
definition lists. This restricts the rule's applicability to cases where the condition 
can be proven equivalent to true under the matching substitution. In the example 
below, the condition (> n 1) can be verified by evaluation since in SMT-LIB, 
the first argument of repeat must be a numeral. 


(define-cond-rule bv-repeat-eliminate-1 ((x ?BitVec) (n Int)) 
(> n 1) (repeat n x) (concat x (repeat (- n 1) x))) 


Note that the rule does not apply to terms like (repeat 1 t) or (repeat O t). 
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Fixed-point Rules The define-rule* command defines rules that should be 
applied repeatedly, to completion. This is useful, for instance, in writing rules 
that iterate over the arguments of n-ary operators. Its basic form, with a body 
containing just a match and target expression, defines a rule that, whenever is 
applied, must be applied again on the resulting term until it no longer applies. 

The user can optionally supply a context to control the iteration. This is 
a third expression that must contain an underscore. The semantics is that the 
match expression rewrites to the context expression, with the underscore re- 
placed by the target expression. Then the rule is applied again to the target 
expression only. In the example below, the :1ist modifier is used to represent 
an arbitrary number of arguments, including zero, of the same type. 


(define-rule* bv-neg-add ((x ?BitVec) (y ?BitVec) (zs ?BitVec :list)) 
(bvneg (bvadd x y zs)) (bvneg (bvadd y zs)) (bvadd (bvneg x) _)) 


This rule rewrites a term (bvneg (bvadd s t ---)) to the term (bvadd (bvneg s) 
r) where r is the result of recursively applying the rule to (bvneg (bvadd t ---)). 


Changes to RARE Here, we briefly mention the changes to RARE with respect 
to [28]. First, we have support for a richer class of approximate sorts, including 
approximate bit-vector and array sorts. Also, we replaced the let construct by 
the new def construct. The definition list is more powerful as it applies to the 
entire rest of the body (whereas 1et was local to a single expression). 

Additionally, to aid with bit-vector rewrite rules, we added several operators: 
bvsize, which returns the width of an expression of sort ?BitVec; bv, which 
takes a integer n and natural w, and returns a bit-vector of width w and value 
n mod 2": int.log2 which returns the integer (base 2) logarithm of an integer, 
and int.islog2, which returns true iff its integer argument is a power of 2. 

We also removed the :const modifier, which was used previously to indicate 
that a particular expression had to be a constant value. We found that this 
adds complexity and is usually unnecessary. For rules that actually manipulate 
specific constant values, we can specify those values explicitly, e.g., by using the 
bv operator above. 


4 IsaRARE: from RARE Rewrites to Isabelle Lemmas 


In this section, we introduce IsaRARE, a plugin for Isabelle that automatically 
translates a RARE rule into an Isabelle lemma stating the correctness of the 
rule. Being able to generate such lemmas automatically is highly desirable, as 
RARE rules may be added and/or changed frequently for a given solver, or differ 
significantly between solvers, and manually translating RARE rules into lemmas 
is time-consuming and error-prone. IsaRARE can also suggest a proof sketch 
which is sometimes sufficient to prove the lemma. If this automatic proof fails, 
the user must provide the proof or determine that the lemma does not hold. In 
the latter case, Isabelle's counterexample-finder Nitpick [10] can be helpful. 
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(define-cond-rule str-len-replace-inv ((t String) (s String) (r String)) 
(= (str.len s) (str.len r)) 
(str.len (str.replace t s r)) (str.len t)) 


lemma str len replace, inv: 
fixes t::string and s::string and r::string 
shows "smtlib str len s = smtlib str len r —> 
smtlib str len (smtlib str replace t s r) = smtlib str len t" 


Fig. 2: RARE rule and corresponding lemma. 


Figure 2 shows an example of a RARE rule (which simplifies the length of 
the result of a string replacement) and the Isabelle lemma generated from it by 
IsaRARE. Roughly speaking, a rule with parameters z,...,2;4, definition list 
((yi di) +++ (Yn dn)), condition c, match expression s, and target expression t 
is converted by IsaRARE into a lemma of the form Vz,..., £m. (c > s = t)o 
where c is the substitution {y1 > di,..., ys > dn}. Type inference in Isabelle 
is used to suitably instantiate the ? wildcards in any approximate sorts in the 
rules. 

Next we discuss the main challenges we encountered while implementing the 
translation from RARE to Isabelle. 


4.1 Adding New Theories 


Since IsaRARE uses Isabelle’s SMT-LIB parser, it was necessary to extend it 
to handle SMT theories not previously supported and, in case there was no 
corresponding Isabelle theory, to define new types, definitions and theorems cor- 
responding to the SMT-LIB theory. For sets and arrays, Isabelle already provides 
the required data structures (Set.set and Map.map respectively) and definitions 
(e.g., union, and store). Translation from the SMT operators and types is thus 
straightforward, requiring only simple extensions to the parser. 

'The SMT-LIB parser also had to be extended for the operators and sorts 
of the SMT-LIB theory of strings. String terms are represented with Isabelle's 
HOL. string, and regular expressions are represented as sets of strings. We de- 
veloped a new theory with auxiliary definitions and theorems meant to facilitate 
the proving of lemmas generated by IsaR ARE. Since strings are defined as lists 
of characters, we were able to reuse many list operators for our definitions. For 
example, string concatenation is defined as concatenation of lists. 

As mentioned, bit-vectors are encoded in Isabelle using the word type, which 
represents integers modulo 2", where n is a type parameter (see Subsection 4.3). 
Isabelle has support for reasoning about this type, but we still had to provide a 
number of extensions. For example, to translate bit-vector rewrite rules, we had 
to extend Isabelle's SMT-LIB parser significantly. We added support for all of 
the standard SMT-LIB operators, as well as some additional operators that cvc5 
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t0 = (extract j i x) A 
size tO=j+1-iA 
tl = (extract l k t0) ^ 
size tl = 14+ 1-k<A 


(define-rule bv-extract-extract t2 = (extract (i-l) (i+k) x) A 
((x ?BitVec) (i Int) (j Int) (k Int) (1 Int))) size t2 (i4) 1 - (i+k) A 

(extract 1 k (extract j i x))) j<sizexANO<iAi<jA 
(extract (+ i 1) (+ i k) x)) l< size tOAO0O<kKAK<I1A 


(i+1) € size x AO € (ik) ^ 
(2X) € (HH) 


(a) A RARE rule (b) Additional Assumptions 


Fig. 3: Implicit Assumption Generation 


supports, such as bvuaddo (which checks for overflow from unsigned addition). It 
was also necessary to add several new definitions and basic theorems to Isabelle, 
for example for reasoning about the extract operator. 


4.2 Mismatch between Isabelle and SMT-LIB operators 


An important challenge for the translation concerns the mismatch between 
SMT-LIB operators and Isabelle functions. One of the main difficulties concerns 
implicit assumptions. As an example, consider the bit-vector extract operator. 
The term (extract i j t) denotes the sub-vector of bit-vector t from index 4 
through index j, where 7 is the more significant index. SMT-LIB specifies that 
the second index j must be at most i, and both indices must be in the range 
[0, n), where n is the bit-width of t — making the result a bit-vector of width 
i -- 1— j. These assumptions are necessary to correctly capture the semantics of 
SMT-LIB's extract since the extract operator in Isabelle is more permissive. 

'There are several ways to address this issue. First, we could make the implicit 
assumptions explicit in the RARE rules. However, this would be tedious and 
error-prone and would greatly clutter the RARE rules. It is also superfluous to 
always manually add them since the constraints are inherent in the SMT-LIB 
semantics. A second option is to write custom definitions for SMT-LIB operators 
in Isabelle that exactly match the SMT-LIB semantics (i.e., are undefined if the 
implicit assumptions do not hold). The main disadvantage of this approach is 
that it complicates proving the translated RARE rules, as those proofs cannot 
directly use any existing Isabelle lemmas that use the standard definitions. It 
also works against one of our long-term goals, which is to be able to use proof 
reconstruction to provide proofs for Isabelle conjectures, conjectures which will 
naturally use the existing Isabelle operators. 

The last option, which we adopted, is to automatically add the implicit as- 
sumptions during the translation of RARE rules to Isabelle lemmas. This does 
make the lemmas a bit more complicated, but it is the minimal complexity 
needed to bridge the semantic gap between the two extract operators. And, we 
can be confident that these implicit assumptions will easily be discharged when 
using the lemmas for proof reconstruction, since SMT proofs only use operators 
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in ways that are consistent with SMT-LIB semantics (unless there is a bug, in 
which case proof reconstruction should fail) Figure 3 shows an example of a 
RARE rule with three applications of the extract operator, together with the 
assumptions added by IsaRARE. 

In a few cases, we had to fall back on the custom definition approach. For 
example, we had to do this for the bit-vector concat operator for bit-vector 
concatenation. To see why, note that the SMT-LIB operator can take two or 
more arguments (abbreviating nested binary applications), each with arbitrary 
bit-width. Recall that the :list annotation in RARE can be used to specify 
a variable number of arguments. There is no way to even state lemmas cor- 
responding to rewrite rules involving concatenations of a variable number of 
arguments in Isabelle using its built-in binary concatenation operator. For this 
case, we thus define a custom concatenation operator that matches the SMT- 
LIB semantics. The implicit assumption that the bit-width of the result is the 
sum of the bit-widths of the arguments is embedded in the custom definition. 
Using the new definition, we can translate the problematic rules into Isabelle 
lemmas. As expected, proving these lemmas requires extra work. Specifically, it 
requires formulating and proving bridging theorems between Isabelle's built-in 
concatenation operator and the new one we defined. 


4.3 Supporting Approximate Sorts 


With the addition of approximate sorts to RARE, we had to extend Isabelle's 
SMT-LIB translator to support them. We observe that Isabelle/HOL is not 
based on a dependently-typed logic. However, it supports an encoding of sorts 
depending on integer values into polymorphic types with parameters that range 
over types expressing ordinals. In particular, bit-vectors of width w are repre- 
sented by the type (n word) of integers modulo 2"; for instance, 3::(8 word) 
represents an integer with value 3 modulo 2°. In fact, thanks to polymorphism, 
it is possible for the bit-width to be a type variable (e.g., 3::('a::len word)). 
Note that this is more precise than allowing the bit-width in the type to be com- 
pletely unknown, as in approximate sorts: with type parameters one can state, 
for instance, that two terms of unknown bit-width have the same width, whereas 
two terms both of sort ?BitVec may have different bit-widths. 

Conveniently then, all the approximate sorts in RARE correspond to poly- 
morphic types in Isabelle. For instance, ?BitVec corresponds to 'a word and 
?Array corresponds to ('a,'b) map where 'a and 'b are type variables. During 
parsing, each occurrence of a approximate sort is converted into an instance of 
the corresponding polymorphic type obtained by instantiating each sort vari- 
able with a fresh dummy type. For some bit-vector operators, the output sort 
is dependent on the input sorts (e.g., extract and concat as mentioned above). 
For applications of such operators, we also use a dummy type for the bit-width 
of each argument for which the width is not known. Once translation is done, 
we use Isabelle's type inference algorithm to concretize each dummy type to a 
monomorphic one. For example, during translation of the rule bv-ugt-eliminate 
below, the variables x and y would both be assigned dummy types. 
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(define-rule bv-ugt-eliminate ((x ?BitVec) (y ?BitVec)) 
(bvugt x y) (bvult y x) 
) 


However, bvugt requires that both of its arguments be bit-vectors of the same 
width in SMT-LIB. This restriction is either already present in the definition 
in Isabelle that we map an operator to, or added during parsing as an implicit 
assumption, as we describe in Section 4.2. The type inference algorithm then 
computes the most general type for x and y that satisfies all assumptions. In this 
case, it correctly infers that they are bit-vectors of arbitrary but equal bit-width. 


4.4 List Parameters 


As mentioned earlier, SMT-LIB supports multi-arity syntax for certain binary 
operators, and RARE supports a variable number of arguments via the :list 
annotation. In contrast, in Isabelle all operators are fixed-arity. To facilitate the 
translation in these cases we added a new datasort, 'a rare ListVar, with a 
single constructor ListVar::'a list > 'a rare ListVar to encapsulate multi- 
ple arguments in a list. We also introduced two second-order operators, called 
rare list left and rare list right, toencode RARE left-associative and right- 
associative operators, respectively. As an example, a Boolean term of the form 
(and zı ++- £n y zZ) is translated to the Isabelle term (rare. list right (A) 
(ListVar [z1,..., zn]) (y ^z)). The rare list. left and rare list right 
functionals fold the operator passed as first argument over the list stored in their 
second argument to obtain properly nested binary applications. For example, if 
n = 2, the Isabelle term above is translated to (xı ^ (xa ^ (y^z))). 

For every multi-arity SMT-LIB operator, we prove that it can be built up 
from Isabelle's built-in binary version using fold(r) functions. For RARE rules 
with list parameters, these transfer lemmas become part of the correctness proof 
automatically generated by IsaRARE. When proving the corresponding lemma, 
we can take advantage of the many lemmas in Isabelle's libraries about fold 
functions without having to know the internals of the translation process. 

If we have a RARE rule in which all arguments to an operator are lists, we 
must handle the special case when the lists are all empty. When the operator 
has an identity element, we return that. For example, applications of and to just 
empty lists are translated as standing for true. So far, we have only encoun- 
tered one operator without an identity: bit-vector concatenation. Since neither 
SMT-LIB nor Isabelle support bit-vectors of bit-width 0, for that operator, we 
explicitly add an assumption ruling out the case where all lists are empty. 


4.5 Writing Lemmas and their Proofs 


To generate a lemma from a RARE rewrite rule, ISaRARE first introduces the pa- 
rameters with their types using Isabelle's fixes construct. Next, it generates the 
statement of the lemma, the goal, which states that the implicit assumptions and 
conditions imply the equality of the match and target terms. The types of any 
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bit-vector constants are fully specified (via type ascription), because otherwise 
the lemma may be too general and not hold. 

Lastly, ISAR ARE adds an Isabelle proof of the lemma. For lemmas that do not 
contain lists, this is simply a call to the main automatic tactic auto. Otherwise, 
the list constructs are eliminated as explained above, and any transfer lemmas 
are applied to the resulting terms. This ensures that goals will not contain any 
IsaRARE list definitions. We then invoke induction for every list and use the 
simp, all tactic to attempt to solve and simplify the goals. 

'The proof is printed in apply style so that it can be easily modified and com- 
pleted manually if Isabelle is unable to discharge all its sub-goals automatically. 


4.6 Availability 


IsaRARE currently supports the theories of uninterpreted functions, linear arith- 
metic, bit-vectors, arrays, strings, and sets. It is publicly available" under the 
BSD 3-Clause license. We plan to submit IsaRARE to the Archive of Formal 
Proofs [20]. We have also been working with the Isabelle maintainers to have 
our extensions to Isabelle itself (e.g., to the SMT-LIB parser) included in the 
official Isabelle distribution. Many features were already included in the lat- 
est release. ISaRARE requires the Word Lib library (which is also included in 
the Archive of Formal Proofs) if it is used on RARE rules containing bit-vector 
operators not present in Isabelle itself. 


5 Evaluation and Experience 


We used IsaR.ARE to help develop, translate, and verify new RARE rewrite rules. 
These rules were designed to address coarse-grained rewrite steps appearing 
in CVC5 proofs, i.e., steps that could not be elaborated into fine-grained steps 
using the existing RARE rules and the approach mentioned in Section 2.2. In 
this section, we report on this experience and also discuss challenges arising 
from particular rewrites and theories. 


5.1 Impact of New Rewrites on cvc5 Proof Holes 


Previous work developed 85 RARE rules for cvc5 [28]. For our evaluation, we 
ran CVC5 with these plus our 163 new rules, bringing the total number of RARE 
rules in the Cvc5 database to 248. We evaluated the impact of the new rules on 
CVCD's ability to produce fine-grained proof steps by comparing the success rate 
of the elaboration (i.e., percentage of rewriting proof steps that are successfully 
elaborated into fine-grained steps) before and after the addition of the new rules. 
We ran Cvc5 on 70,709 unsatisfiable benchmarks, as determined by cvc5 [2, 
Sec. 4], in the SMT-LIB logics containing quantifier-free problems with equality 
and uninterpreted functions, arrays, linear arithmetic, strings, and bit-vectors. 


T https://github.com/cvc5b/IsaRARE 
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rewrites 
theory old new proven autoproven 
EUF 22 43 43 37 
Arithmetic 23 22 22 14 
Sets 0 7 7 7 
Arrays 0 4 4 4 
Strings 40 57 57 37 


Bit-vectors 0 115 84 62 


Table 1: Rule and rule verification counts per theory 


The results were generated with a cluster equipped with 16 x Intel(R) Xeon(R) 
CPU E5-2637 v4 Q 3.50GHz, 62.79 GiB RAM machines, with one core per 
solver/benchmark pair, 1200s time limit, and 8gb memory limit. 

For string benchmarks (the only set evaluated in [28]), the success rate went 
from 9296 to 9896. Results on the logics with equality and uninterpreted func- 
tions, arrays, and linear arithmetic were similar. By far the most challenging 
theory, in terms of rewrite rules, is the bit-vector theory. Prior to our work, 
there were no RARE rules for this theory, so no bit-vector rewrite steps could 
be turned into fine-grained steps. With our 115 new RARE rules for bit-vectors, 
9296 of coarse-grained bit-vector rewrite steps are successfully elaborated into 
fine-grained steps. We see this as tremendous progress towards full fine-grained 
proofs for bit-vector problems. 


5.2 Translating and Verifying Rewrites 


In Table 1, we list the number of new rules in each theory, distinguishing between 
how many were there before (old) and the total including both the old rules and 
our new rules (new).° We also show how many of the lemmas we have successfully 
proven and how many of these were done automatically, i.e., either by the proof 
suggested by IsaRARE or by a single call to Sledgehammer. The proven column 
shows that all non-bit-vector rules as well as most of the bit-vector rules have 
now been proven. The numbers in the last column show that most of the proofs 
were provided automatically by IsaRARE. 

For the theory of strings, the number of lemmas automatically proven is not 
clear-cut. For other theories, libraries with useful background lemmas already 
existed, but for strings we had to add many new general-purpose lemmas our- 
selves and then decide whether these should count as background lemmas or as 
part of the proof effort for a rewrite rule. We were rather conservative in that 
decision, i.e., we did not count a lemma as automatically proved if it used a 
lemma whose classification as a background lemma was in doubt. Many of the 


8 Consolidation in the set of arithmetic rules actually resulted in one fewer rule than 
existed previously. 
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translated string rewrites had to be proved manually because they required in- 
duction on string length, especially since many operators are defined inductively. 
However, we found that most of these manual proofs were fairly easy once an 
appropriate induction variable was selected. 

There are no performance issues—IsaRA RE translates most files in millisec- 
onds. Even for our biggest RARE database, the one containing bit-vector rules, 
IsaRARE took only around 1-2 seconds on our machine. 


5.3 Bugs Found in String Rules 


We found several bugs in the existing RARE rules for strings by using Isabelle's 
counterexample finder Nitpick [10] on the translated Isabelle lemmas. We diag- 
nosed and fixed each of them, so that now they can all be verified.? The bugs 
fall into three main categories. 


Misinterpreted Semantics: The str.substr operator takes three arguments and 
returns the substring of the first argument, starting at the position given by the 
second argument, and continuing for the number of characters specified by the 
third argument. The following (corrected) rule simplifies a substring expression 
to the empty string whenever the third argument is 0 or negative. 


(define-cond-rule str-substr-empty-range ((x String) (n Int) (m Int)) 
(>= 0 m) (str.substr x n m) nue) 


However, the first version of the rule had the wrong condition: (>= n m) rather 
than (>= 0 m). This is likely due to the rule's author mistaking the third argu- 
ment of str.substr for an absolute index instead of a relative offset. 


Forgotten Condition: 'The corrected rule below says that, under some assump- 
tions, the length of a substring term is equal to the offset (third) argument. 


(define-cond-rule str-len-substr-in-range ((s String) (n Int) (m Int)) 
(and (>= n 0) = m 0) (>= (str.len s) (+ n m)) 
(str.len (str.substr s n m) m) 


The earlier version of the rule did not include the condition (>= m 0). This how- 
ever, makes it unsound, because according to the semantics of str.substr, if the 
offset is negative, the result is just the empty string. This led to a counterex- 
ample with a negative value for m. Note that this condition is not automatically 
added by IsaRARE since str.substr is defined for negative offsets. 


Misunderstanding the Rewrite: One rule was designed to closely mirror a 
piece of CVC5 code implementing a rewrite, but it failed to properly capture all 
cases. The code involved included several conditionals resulting in two different 
ways a term could be rewritten. The original rule only captured one of the two 
cases and even missed one of the conditions for the case it included. Since this 
rule was quite complex and was only incorrect for some corner cases, it would 
have been challenging to find this bug without our verification effort. 


? Fortunately, none of the bugs in rules corresponded to buggy code in CvC5 itself. 
However, cvc5 could have used those rules to construct incorrect proofs. 
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5.4 Bit-vector Rewrite Rules 


Bit-vector theory solvers make extensive use of rewriting, employing large num- 
bers of rewrite rules. In order to define RARE rules for CvC5’s bit-vector theory, 
we began by analyzing the CVC5 rewriting code, which implements a total of 
99 rewrite methods. We then wrote RARE rules to try to capture the behavior 
of these methods. There are 5 methods that are too complex to be captured by 
RARE (or by any straightforward extension of it). For each of these, we instead 
added new hard-coded proof rules to the Cvc5 proof rule database.!? These 
hard-coded proof rules are not included in Table 1, but they are used to help 
demonstrate the overall progress on SMT-LIB proofs (Section 5.1). The long- 
term plan for reconstruction of proofs using these rules is to write custom Isabelle 
tactics for reconstructing those proof steps. 

Unlike with the string rules, where we applied IsaRARE to already-written 
rules, we used IsaRARE extensively to help debug the bit-vector rules as they 
were being written. We were able to quickly and easily find many kinds of mis- 
takes this way. For example, rule authors mixed up bvneg (unary 2's complement 
negation) and bvnot (bit-wise Boolean negation). In other cases, rules used in- 
consistent bit-widths. The type inference that IsaRARE performs is particularly 
helpful in such cases, as it is stricter than the cvc5 RARE parser. 

Many of the bit-vector rules can be proved automatically, but others must 
be proved manually and are quite challenging, especially those involving signed 
arithmetic or division. Despite this, as shown in Table 1, the process of manually 
proving the full set of bit-vector lemmas is largely complete. This is important 
for our long-term goal of reconstructing SMT proofs in Isabelle. 


6 Conclusion 


We presented IsaRARE, a tool providing an automatic pipeline for verifying 
rewrite rules. We showed the effectiveness of our approach by proving the cor- 
rectness of a large number of rewrite rules used in CVC5 proofs. Our experiments 
show that many lemmas can be proved with minimal user interaction. 

'This work is also part of a long-term project that aims to further automate 
proof search in Isabelle. The goal is to be able to reconstruct any CVC5 proof in 
Isabelle's internal inference engine. This, of course, also includes reconstructing 
rewrite steps. The lemmas IsaRARE generates are directly applicable to this 
effort. We plan to provide a detailed description and evaluation of this larger 
effort in future work. 


Data Availability Statement The datasets generated and analyzed during the 
current study are available in the Zenodo repository: https://zenodo.org/ 
records/10048664 [24]. 


10 This is analogous to the handling of polynomial normalization in [28]. 
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Abstract. Modern deductive verification tools succeed in automatically 
proving the great majority of program annotations thanks in particular 
to constantly evolving SMT solvers they rely on. The remaining proof 
goals still require interactively created proof scripts. This tool demo pa- 
per presents a new solution for an automatic creation of proof scripts 
in Frama-C/WP, a popular deductive verifier for C programs. The veri- 
fication engineer defines a proof strategy describing several initial proof 
steps, from which proof scripts are automatically generated and applied. 
Our experiments on a large real-life industrial project confirm that the 
new proof strategy engine strongly facilitates the verification process by 
automating the creation of proof scripts, thus increasing the potential of 
industrial applications of deductive verification on large code bases. 


Keywords: deductive verification, proof automation, interactive proof scripts, 
proof strategies, Frama-C. 


1 Introduction 


Recent years have seen many successful applications of deductive verification [7, 
8]. Modern deductive verifiers manage to automatically prove the greatest num- 
ber of proof goals, also called proof obligations, or verification conditions (VCs). 
This is in particular due to powerful and constantly evolving SMT solvers they 
rely on. The remaining unproven goals typically require some form of interac- 
tive proof: either with a proof script indicating a few initial proof steps to make 
the goal more suitable for an automatic prover, or a fully interactive proof in a 
proof assistant like Coq. The need for an interactive proof remains an important 
obstacle to a wider application of deductive verification on large projects. 

It can be illustrated by a recent proof [6] of real-life smart card code—a 
JavaCard Virtual Machine (JCVM)—that was performed by Thales for the high- 
est EAL6-EALT levels of Common Criteria certification? using Frama-C/WP [9], 


4 The EAL7 certificate delivered by the French certification body ANSSI 
is available at https://cyber.gouv.fr/sites/default/files/document_type/ 
Certificat-CC-2023 4bfr. O.pdf. 
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a popular deductive verifier for C programs. Even if a very high level of automa- 
tion is achieved in that project and less than 296 of proof goals require manually 
created proof scripts, a significant effort is still required for the remaining goals 
because hundreds of properties are concerned. 

Moreover, proof scripts are sensitive to the versions of the deductive verifier, 
of the code and the specification. Thus, proof scripts not only need to be created 
once for a given version of the target code, its specification and the verifier, but 
often have to be recreated when the code or the specification are updated, or the 
verifier evolves (and hence the way to generate VCs is modified). Thus, the need 
for manually created proof scripts for the unproven goals is seen as an important 
obstacle to a better maintenance of the proved code in the industrial setting. 

This tool demo paper presents a new mechanism? for an automatic creation 
of proof scripts in Frama-C/WP. The verification engineer defines a proof strat- 
egy describing the alternative proof steps to be tried, from which proof scripts 
are automatically generated and applied. Our experiments on the JCVM verifi- 
cation project confirm that the new mechanism strongly facilitates the verifica- 
tion process, thus increasing the potential of industrial applications of deductive 
verification on large code bases. 

The contributions of this work include a demonstration of the new mech- 
anism for automating the creation of proof scripts in Frama-C/WP based on 
user-defined proof strategies, its illustration on simple examples and its evalua- 
tion on a real-life industrial project. 


2 Deductive Verification with Frama-C/WP 


Frama-C is an open-source, industrially mature, extensible framework for ver- 
ifying C programs annotated with ACSL [2] specifications. The WP plug-in of 
Frama-C allows the user to prove that the C code respects the ACSL specifications 
using deductive verification [7,8]. More precisely, WP implements an efficient 
variant of weakest precondition calculus [10], hence the name of the plug-in. 

ACSL specifications, written inside special comments “/*@...*/”, basically 
consist of function contracts and code annotations. Function contracts include 
pre-conditions (requires clauses) and post-conditions (ensures clauses), con- 
taining pure logical formulas that shall be verified respectively before and after 
any call to a function. The assigns clause specifies the possible side effects of the 
function on global variables and pointers received in parameters. Code annota- 
tions (e.g. assert clauses) contain pure logical formulas attached to a particular 
program point that shall be verified at each execution path going through this 
program point. These clauses are illustrated by the program below: 


1 /*0 

2 requires Oc o X y ; 

3 ensures \result == (my + y) / 2; 
4 assigns \nothing; 

5 */ 

6 int middle(int x,int y) 


5 publicly available on https://git.frama-c.com/pub/frama-c/ as part of the cur- 
rent development version (and in the upcoming release planned for November 2023). 
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7 1 

8 /*Q0 assert O X y - x < MAX INT; */ 

9 /*0 assert O0 X x + (y-z) / 2 < MAX INT; */ 
10 return x + (y - x) / 2; 

ib 


Frama-C contains plug-ins that can generate assertions, and plug-ins that 
can prove assertions, or both. For instance, the RTE plug-in can generate code 
annotations that are sufficient for the program to never go into unspecified or 
undefined behaviors. The assertions in the previous example (Lines 8-9) show 
two of the five assertions generated by RTE on this code. 

WP is able to prove code annotations written by the user or generated by 
other plug-ins. It works by using deductive verification: ACSL logic formula and 
C-code instructions are translated to some equivalent pure logic formula in a 
first-order logic language. Each generated formula is first simplified by a built-in 
solver named Qed [4] and then submitted to external provers, generally auto- 
mated SMT solvers such as Alt-Ergo, Z3, CVC4 or CVC5. On the above program, 
WP can prove all ACSL annotations written by the user and generated by RTE: 


1 $ frama-c -wp -wp-rte middle.c 

2 [rte:annot] annotating function f 
3 [wp] 8 goals scheduled 

4 [wp] Proved goals: 8/8 

5 Qed: 2 

6 Alt-Ergo 2.4.2: 6 (4ms-14ms) 


In this example, RTE generated 5 annotations and WP generated 8 formulas 
for proving all resulting ACSL annotations, 2 of which being proved by Qed 
simplification, and the remaining 6 being proved by Alt-Ergo in few milliseconds. 


3 Automated vs. Interactive Proofs 


In most cases, ACSL annotations are automatically proven by Qed and SMT 
solvers. However, sometimes an automated proof might fail for a correct formula 
because deductive verification is not complete in general, and WP in particular. 

In such a situation, WP offers different features to complete the proofs. First, 
the user might help SMT solvers by introducing intermediate code annotations, 
hence providing proof hints and intermediate proof results. Second, the user 
might enter the interactive proof mode with the Frama-C graphical interface, 
in which the user can apply so-called tactics to transform a proof goal into a 
conjunction of several, typically simpler ones, that WP can try to prove in turn. 
This process can be iterated, and all the applied tactics can be saved on disk in 
a proof script file that can be replayed later from the Frama-C command line. 

After some efforts, the user can thus manage to achieve full automation in 
proof replay for a proof campaign: all proof goals are discharged automatically 
by SMT solvers, possibly thanks to proof hints provided as code annotations, 
and possibly after applying tactics from saved proof scripts. 

WP offers a large variety of tactics. Common ones include splitting over 
a boolean expression; brute-forcing an integer expression within a given range 
(detailed below); unfolding predicate or function definitions; removing hypothe- 
ses; etc. Applying tactics is simple in spirit, although it raises complex issues in 
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practice. Consider for instance the Range tactic, which can be defined as follows, 
where q is the current goal, e some expression and a < b two integer constants: 


range(o,e,a, b) = A (e=k = vg)^(e«a = eg)^(e» b = v) 
kea...b 


Applying it on goal y consists in replacing? y by range(y, e, a, b). It requires to 
have at hand the expression e and the two constants a and b. Under the graphical 
user interface (GUI), those arguments are selected by the user from the goal. 
However, bookkeeping them in a proof script is not that simple, especially if we 
want the proof script to resist to minor changes in the code or the specifications. 
WP has dedicated features to achieve this choice but up to a certain extent. 

In practice, managing proof scripts during the lifetime of large projects is 
an industrial issue. On the contrary, proof hints in the form of intermediate 
code annotations are quite robust. However, writing code annotations by hand 
is tedious. On the other hand, applying tactics to decompose goals is quite 
efficient, and it appears that, on a given application, many pending goals are 
solved by applying few tactics with very similar patterns. Those observations 
lead us to the design of proof strategies. 


4 Definition of Proof Strategies 


This section introduces the main principles and selected features of proof strate- 
gies through illustrative examples, which can be tested using the companion 
artifact [5]. We refer the reader to the WP manual [1] for a detailed description. 

Proof strategies are user-defined specifications for combining automated sol- 
vers with pattern-driven tactics. A proof strategy consists of a list of alternatives 
to be tried in sequence on a proof goal until success. Elementary alternatives 
consist in trying one or several SMT solvers with a specified timeout, or applying 
a tactic on a goal. Lists of alternatives can be grouped and given a (strategy) 
name, that can be used as an elementary alternative as well. Then, specific 
proof strategies can be associated to specific proof goals, functions or lemmas. 
For instance, the user may associate proof strategy A to every code annotation 
with name P and proof strategy B to every code annotation without name Q, 
and finally proof strategy C to other code annotations. 

Proof strategies and their association to proof goals are user-written as spe- 
cific ACSL extensions defined and managed by the Frama-C/WP plug-in. An 
overview of these annotations is provided below: 


strategy strategyname : alternative ,..., alternative ; 
proof strategyname : target ,..., target ; 


The strategy clause introduces a new proof strategy strategyname, whereas 
the proof clause associates it to some property targets, i.e. individual goals 


$ We have range(y, e, a,b) — y, which is sufficient for the tactic to be safely applied. 
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or sets of goals, using the same syntax as for frama-c command line, which 
simplifies users’ learning curve. As introduced above, each alternative might 
consist of: 


— \provers(p,...,p,time) which tries the specified provers in sequence with 
a specified timeout. 

— \tactic(id,param...) tries to apply the specified tactic with the associated 
parameter(s). 

— strategyname or Ndefault tries the specified named strategy. 


Parameters for applying tactics are the most expressive but also the most 
complex components of proof strategies. As briefly introduced in previous sec- 
tion, a tactic transforms a proof goal into one or several sub-goals that are suffi- 
cient to entail the initial goal. The difficult point with tactics is that they need 
parameters to be applied. For instance, the tactic range illustrated in previous 
section must be applied to an expression and a range of two integer constants. 
From the Frama-C GUI, proof engineers often pick those parameters from the 
goal itself, according to some patterns of interest and their experience. Our proof 
strategy language allows proof engineers to specify those patterns, and to build 
tactic parameters with required values accordingly. 

A trade-off between robustness and precise definition of tactic applications is 
an important design objective. The proposed strategy language allows a signifi- 
cant flexibility in choosing precise (and less robust) or more general (and more 
robust) patterns. The latter include ’_’ for any expression, '..' for any number 
of arguments, ’A:_’ to introduce a variable to name a subexpression and to use 
it in a tactic parameter or a pattern to select, etc. 

Consider lemma dn3 in Fig. 1, not proved by Alt-Ergo. It can now be proved 
by associating to it the following strategy (we omit surrounding /*@...*/): 


1 strategy RangeThenProver: 5 \param("inf",0),\param("sup" ,255) , 
2 \tactic ("Wp.range", 6 \children(RangeThenProver) ), 

3 \pattern(is_uint8(e)), 7 Nprover("alt-ergo",2); 

4 \select(e), 8 proof RangeThenProver: dn3; 


The "Wp.range" name identifies the range tactic introduced above. This 
strategy looks for a variable e of type unsigned char (pattern is_uint8(e), cf. 
Line 3) in the goal. If such a pattern is found in goal y, the tactic range(y, e, 0, 255) 
is applied on y (cf. Lines 2-5). Otherwise, the Alt-Ergo prover is applied for 2s 
(Line 7). The tactic specification language also offers directives to specify which 
strategies shall be applied on the resulting sub-goals. Line 6 above indicates that 
the strategy should be applied recursively. In this way, it enumerates first the 
values of c, then those of d. Indeed, the recursive application to all subgoals in 
this case is equivalent to selecting a first variable of type unsigned char and 
enumerating its values, then for each fixed value, doing so for a second variable 
of type unsigned char (and in this case, there are no more such variables). WP 
takes only ~1s to automatically create the script and prove the lemma, while 
its manual creation would take several minutes. 

Moreover, each sub-goal generated by applying a tactic has predefined names. 
For instance, tactic range(y, e, a, b) generates a sub-goal named "Lower a" for 
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1 lemma dn3: 7 lemma vhm.preservedíL1,L2): 

2 V unsigned char c d; 8 valid heap modelíL1) A 

3 (c & Ox8bE) —— 2 ^ 9 mem model. footprint intactíL1,L2) ^ 

4 (c & 0x01) —— 1 ^ 10 Nat(gNum0bjs,L1) == \at(gNum0bjs ,L2) A 
5 (d & Ox8F) == 0 11 object headers intactíL1,L2) 

6 => ((c*d) & 0x03) == 0x03; 12 = valid heap. modelíL2)7; 


Fig.1. Two ACSL lemmas not proved by automatic prover Alt-Ergo (with a 5 min. 
timeout). 


1 strategy FastAltErgo: \prover("alt-ergo", 1); // run Alt-Ergo for 1s 

2 strategy EagerAltErgo: \prover("alt-ergo",10); // run Alt-Ergo for 10s 

3 strategy UnfoldVhmThenProver: // Strategy with three steps: 

4 FastAltErgo, // 1) fast prover attempt 

5 \tactic("Wp.unfold", // 2) if unproved, unfold 

6 \pattern(P_valid_heap_model((..))), // predicate valid heap model 

7 \children(UnfoldVhmThenProver) ), // and apply itself recursively 
8 EagerAltErgo; // 3) longer prover attempt 

9 proof UnfoldVhmThenProver: vhm_preserved; // Associate strategy to goal 


Fig. 2. Strategies to automatically create a proof script for lemma vhm, preserved of 
Fig. 1. 


case e < a, "Upper b" for case e > b and "Value k" for each case e = k with k € 
a..b. The user can then specify which strategy shall be used for each generated 
sub-goal. More detailed documentation can be found in the WP manual [1]. 

The second lemma in Fig. 1 comes from the example in [6] on the proof of 
the JCVM. It was not proved by the Alt-Ergo prover [3] (used in that work) 
and required a proof script. Basically, lemma vhm, preserved deduces predicate 
valid, heap. model at label (i.e. program point) L2 from the same predicate at 
label L1 (Lines 8, 12 in Fig. 1) if additional conditions are satisfied: the variables 
defining the memory state and the number of allocated objects do not change 
between labels L1 and L2 (Lines 9-10), and the headers of the allocated objects 
(indicating object owner, object size, etc.) do not change between labels" L1 and 
L2 either (Line 11). Such lemmas are useful in large verification projects with 
lots of variables: by showing the preservation of values only for a few variables 
between two program points, this lemma allows the tool to deduce the predicate 
of interest at a new program point. The exact definition of predicates is not 
necessary to follow the present paper (and can be found in [6]). 

With the presented extension of WP, the verification engineer can define a 
strategy UnfoldVhmThenProver (see Fig. 2) indicating which proof steps should 
be applied in order to achieve the proof. First, it calls the Alt-Ergo prover to 
check whether the goal can be proved with a short timeout (cf. Lines 4 and 
1). If not, Lines 5-7 provide another alternative: to apply the Unfold tactic to 
unfold the definition of predicate valid heap model (in any part of the goal 
and with any number of arguments). Line 7 indicates that after a successful 
unfolding, the same strategy should be applied iteratively on the resulting sub- 


T Labels L1 and L2 can be C labels or predefined ACSL labels [2]. While labels are not 
directly preserved in the resulting VCs, the variables at those labels typically have 
different names, so it is still possible to match the corresponding values. 
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goals (children). Finally, Line 8 indicates that if the unfolding alternative cannot 
be applied anymore, a longer prover attempt is tried (cf. Line 2). This strategy 
allows WP to prove the target lemma in ~2s. 


5 Industrial Evaluation and Conclusion 


We have applied the presented extension of Frama-C/WP to the proof of the 
real-life JCVM code? (with 8,000+ lines of C and 30,000+ lines of ACSL) at 
Thales. The complete proof for 85,000 goals using Alt-Ergo with a 250s timeout 
requires 800+ proof scripts. The new tool saves a very significant effort: after a 
manual creation of strategies (~2 days), WP automatically produces more than 
50% of the required scripts, whose manual creation would take ~1 person-month. 
This effort is estimated by the authors based on the experience of manual proof 
script creation in the industrial context over four years. In this experiment, the 
strategies are created by the same verification engineers who have previously 
created proof scripts. The same strategy is often able to successfully prove several 
dozens of proof goals, which confirms the reusability of strategies for multiple 
goals. 

We summarize our experiment as a two-step workflow. First, the verification 
engineer creates proof strategies. Frequently used tactics (Unfold, Split, etc.) 
may be used as an initial guess with a large timeout in order to maximize proof 
automation. If some goals are still not proved, the engineer uses their experience 
to propose new ones, tuned to failed goals. The generated scripts are then saved 
for a proof replay session. Second, the engineer optimizes the strategies, e.g. 
by optimizing the script generation or replay time. The creation of strategies 
requires similar skills as for the creation of proof scripts. 

We believe that an even greater number of proof scripts can be generated 
from strategies, which will strongly facilitate industrial verification. Future steps 
include identification and implementation of further strategy features, and their 
rigorous evaluation on various industrial projects. A detailed analysis of the 
reasons why some goals remained unproven in our experiment on the JCVM 
code will provide a better understanding of the nature of those goals and the 
required additional strategies. Finally, an evaluation of the usability of strategies 
by various categories of users (e.g. verification engineers who are not familiar 
with the target project or with proof scripts in Frama-C) is another future work 
perspective. 
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8 Being highly security-critical, this code cannot be shared or included in an artifact. 
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Abstract. Formal verification of multipliers is difficult. This paper pre- 
sents a custom tool, VeSCMul, designed to address this problem. VeSC- 
Mul can be effectively applied to a wide range of hardware verification 
challenges, including multipliers with saturation, flags, shifting, trunca- 
tion, accumulation, dot product, and even floating-point multiplication. 
'The tool is highly automated with a user-friendly interface, and it is very 
efficient; for instance, verification for designs with 64-bit operands can 
finish in seconds. Notably, VeSCMul has been successfully utilized for 
both commercial designs and publicly available benchmarks. Regarding 
the reliability of its results, VeSCMul itself is fully verified, instilling con- 
fidence in its users for soundness. It also has the option to be used with 
a SAT solver for completeness and counterexample generation. Readers 
of this paper will gain insights into the capabilities and limitations of 
VeSCMul, as well as how to employ it for the verification of their own 
designs. 


Keywords: Multipliers - Hardware Verification - Formal Methods 


1 Introduction 


Integer multipliers are crucial components in processing units. Ensuring their 
correctness through formal verification is essential; however, historically, veri- 
fying them has proven to be challenging [A416][5]18]. Automated methods like 
SAT solving, BDDs, and computer algebra systems have either failed to scale or 
demonstrated limited applicability in this context [BJs12]16]5]. On the other 
hand, the S-C-Rewriting method has been shown to be very efficient in formally 
verifying a large variety of RTL designs [21]24]25]6]. 

S-C-Rewriting and auxiliary programs are packaged into the VeSCMul tool 
(pronounced *vesk-muhl"). VeSCMul is designed to be user-friendly and com- 
prehensive for sound, fast, and automatic verification of multiplier-centric RTL 
designs. It has an improved user interface tailored for non-experts, simplifying 
tool usage. VeSCMul has also introduced the support for fully automatic verifi- 
cation with its new adder detection program. VeSCMul has undergone extensive 
testing on thousands of public benchmarks as well as proprietary industrial de- 
signs at Intel Corporation. Its open-source and free-license status enables others 
to use this tool for similar verification tasks. 

(9 The Author(s) 2024 


B. Finkbeiner and L. Kovacs (Eds.): TACAS 2024, LNCS 14570, pp. 340-349, 2024. 
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This paper presents VeSCMul, and it is outlined as follows. Sec. [2] walks 
through a demo for VeSCMul, showing the user-interface. Sec. B]gives an overview 
of the tool flow. Sec. H]iists some of the noteworthy features. Sec. D|delivers exper- 
imental results on both public and proprietary designs. Sec. [6] discusses related 
work and concludes the paper. 


2 Installation and a Demo 


VeSCMul is implemented in the ACL2 theorem prover and programming lan- 
guage [10], and it is fully verified. VeSCMul is open-source with the MIT license, 
included as à Community Book in the ACL2 distribution on Github, which 


can be found at https://github.com/acl2/acl2 under |books/projects/vescmul 


Installing ACL2 and building the books will bring along VeSCMul. 

A comprehensive and up-to-date (documentation for VeSCMul is available as 
part of ACL2’s manual, accessible at http://acl2.org/manual. This documenta- 
tion is extensive, covering thousands of topics from ACL2 sources and Commu- 
nity Books. Throughout this paper, various documentation topics are referenced 
using the notation “:doc <topic>”. 

Once ACL2 is installed and books are built, users can run a VeSCMul demo 
by running the events from Listing [1.1] within an ACL2 interactive session. 


Listing 1.1: Simple demo running VeSCMul on a signed 64x64-bit multiplier with 
Booth radix-4 encoding, Dadda tree, and Han-Carlson adder. 


(include-book "projects/vescmul/top" :dir :system) 


(vescmul -parse 
: name my-multiplier-example 
: file "DT. SB4. HC, 64. 64, multgen.sv" 
: topmodule "DT. SB4. HC. 64. 64") 


(vescmul-verify 
:name my-multiplier-example 
:concl (equal RESULT 
(loghead 128 (* (logext 64 IN1) 
(logext 64 IN2))))) 


The first event (include-book) loads VeSCMul and required libraries, which 
takes about a minute. Alternatively, an executable can be created for instant 
loading (see (:doc save-exec). The second event parses the 
target design, taking a few seconds. The Verilog file is available in the ACL2 git 
repository under the directory. The third event 
uses VeSCMul to verify the design. :concl specifies the con- 
jecture, with RESULT as the output signal name, and IN1 and IN2 as input signal 
names. logext sign-extends bit-vectors (represented as integers), and loghead 
zero-extends or, in other words, truncates them. The inputs are 64-bit signed 
numbers, producing a 128-bit multiplication result. VeSCMul can fully verify 
this design in 1-2 seconds (as tested on a Macbook M1 pro). 
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The vescmul-parse and vescmul-verify utilities are two LISP macros that 
invoke various programs to parse and then verify target designs. 

The vescmul-parse macro packs VL/SV /SVTV utilities to parse Verilog de- 
signs and create symbolic simulation vectors. These utilities are publicly available 
and come with the ACL2 installation. They have been developed and used in 
industry (i.e., Centaur Technology and Intel Corporation) (see (:doc sv). 

The vescmul-verify macro gathers the symbolic simulation objects, detects 
adder components, applies the S-C-Rewriting algorithm, and maybe utilizes SAT 
solving in the end. The program flow is shown in Fig.[1] 'These steps are explained 
as follows. 


(equal design out 
(* IN1 IN2)) 


(C 
Cl 
IE 


a) (2) (3) 


(equal design_out 
(* IN1 IN2)) 


FGL to invoke 


a SAT solver (Maybe) Uu D 4 


A fixed form 
(s-c-form) 


(5) (4) 


Fig. 1: Flow chart of vescmul-verify. (1) User states a conjecture with high- 
level specification. (2) VeSCMul receives a sea of gates from the design. (3) The 
tool identifies and rebuilds half/full-adders in this sea of gates. (4) The design 
and the spec are rewritten with the S-C-Rewriting methodology. (5) If rewriting 
is not conclusive, rewritten conjecture can be passed to FGL for SAT solving. 


(1) Specification is provided by the user, stating a relation between input and 
output signals. This is typically a combination of multiplication (*), addition (+), 
subtraction (-), truncation/zero-extension (loghead), sign-extension (logext), 
part selection (part-select), and possibly user-defined functions. 

(2)(3) S-C-Rewriting algorithm needs to differentiate and specially rewrite 
adder components (e.g., full/half-adders) in a design. In previous work [25]26], 
S-C-Rewriting algorithm was used only for designs whose design hierarchy in- 
formation around adders was readily available. VeSCMul has been improved to 
now support flattened designs. This is achieved by an internal program that goes 
through a sea of gates to identify and mark the adder components before ap- 
plying the S-C-Rewriting algorithm. Tests have shown that this program works 
very well for successful verification of various architectures (see Sec. [). Should 
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the program not identify some adders and the verification attempt fails because 
of that, users may also pass hierarchical verification hints (see (:doc vescmul). 

(4) When VeSCMul applies S-C-Rewriting, the rewriter tries to rewrite both 
the specification and the design to the same form (i.e., s-c-form [25]), and the two 
sides are compared syntactically. For correct multiplier designs, this is usually 
enough to prove the conjecture. 

(5) If S-C-Rewriting cannot show the conjecture to be correct, it returns its 
rewritten form. Users have the option to automatically use the FGL utility [19] 
(see that can bit-blast the rewritten conjecture, perform AIG trans- 
formations, and invoke an external SAT solver like CaDiCaL [I]. FGL is also a 
verified program. This can either generate counterexamples for false conjectures, 
or help finalize the proofs in some fringe cases. For example, in x86 multiplier 
designs, extra circuitry is used to calculate flags based on multiplication results, 
such as the overflow flag that is set when a certain portion of the result are not 
homogeneously Os or 1s. VeSCMul by itself may not be able to process the extra 
flag logic; however, it can rewrite and simplify the multiplication component, 
send the rewritten expression to an external SAT solver through FGL, and final- 
ize such proofs in a matter of seconds or minutes. Note that if the multiplication 
component is not rewritten as intended by S-C-Rewriting, it is unlikely for a 
SAT solver to scale and finish the proofs for operand sizes greater than 16-bits. 


4 Notable Features and Compatible Tools 


This section highlights some of the useful and noteworthy features of VeSCMul 
as well as compatible tools. 

Customizable specification: Users can state their own specifications to 
verify various multiplier configurations such as multiply-add, dot product, and 
multipliers with shifted, truncated and/or saturated outputs. 

Automatic adder detection: VeSCMul includes an adder-detection pro- 
gram that identifies and marks adders before employing the S-C-Rewriting algo- 
rithm. This makes the overall verification procedure fully automatic for a large 
variety of multiplier designs (see Sec. [5] for experiments). 

End-to-end verified: The author has rigorously verified, using ACL2, that 
VeSCMul’s all rewriting operations on given conjectures are sound. Users can 
place high confidence in the results when a design is claimed to be correct. 
Verifying such a substantial program is a complex process, demanding ACL2 
expertise [20[21]22]. 

Exporting a clean multiplier with design hierarchy: The included 
adder-detection program can be used as a stand-alone feature. Given a flat- 
tened multiplier design, VeSCMul can export a functionally equivalent Verilog 
module with adder components separated as half/full-adder submodules. This 
feature may be particularly useful for researchers addressing the multiplier veri- 
fication problem, where adder detection can be a common challenge [MMMA]. 
For soundness, VeSCMul includes a mechanism for formal equivalence checking 
between the original and exported designs. 
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Integration into other verification flows: Proofs generated by VeSCMul 
can be integrated into other ACL2-based verification workflows. For instance, 
when verifying floating-point fused-multiply-add (FMA) operations, which often 
involves decomposing the design into integer multiplication and post-multiplica- 
tion parts, VeSCMul can be used for the multiplication part while SAT solving 
can be employed for the rest. Existing and actively used decomposition tool flows 
in ACL2 (see and VeSCMul are compatible. 

Verification of sequential circuits: VeSCMul can handle sequential cir- 
cuits, including pipelined designs. Additional key arguments can be provided to 


vescmul-parse to verify such designs (see :doc vescmul-parse). Modules with 
control logic reusing the same circuitry for various arithmetic operations (e.g., 


se[rdoc miltiplier-verification-demo-2) are also supported, 

Waveform generation: VeSCMul is compatible with another tool (see :doc 
svtv-debug$) for generating waveforms in the VCD format. This capability can 
be valuable for pinpointing the cause of bugs in case of counterexamples. 


5 Experiments 


VeSCMul has undergone extensive testing and utilization across various architec- 
tures in both public benchmarks and proprietary x86 processor design projects 
at Centaur Technology and Intel Corporation. 

Various benchmarks are gathered for experiments using publicly available 
generators Summation trees include Dadda (dt), Wallace (wt), 4-to- 
2 compressor (4:2), array (ar), redundant binary addition (rbat), balanced de- 
lay (bdt), overturned stairs (os) trees. Partial products include signed/unsigned 
(s/u) simple (sp), Booth radix-2 (b2), radix-4 (r4), radix-8 (r8), radix-16 (r16) 
encodings. Final stage adders include block carry lookahead (bcla), carry looka- 
head (cla), carry-select (csel), Ladner-Fischer (1f), carry-skip (csk), conditional 
sum (cond), Brent-Kung (bk), ripple-carry (rp), Kogge-Stone (ks), Han-Carlson 
(hc), J. Sklansky conditional (jsk) adders. 

Table [I] contains a large number of benchmarks to compare the performance 
of VeSCMul to other prominent verification tools: AMulet [8] and RevSCA2 [12] 
that target nxn-bit multipliers with 2n-bit results. The newest version of AMulet 
(AMulet2) timed out for the majority of the benchmarks; the owner is notified, 
and AMulet1 is used in the experiments instead. The results for AMulet1 includes 
the time to check for proof certificates. RevSCA2 is neither a verified program 
nor does it produce certificates to check its results. These experimental results 
show that VeSCMul scales much better for large multipliers. 

In addition to standard input/output sizes (n x n-bit multipliers with 2n- 
bit results), Table [2] includes VeSCMul’s verification results for variations such 
as multiply-add (e.g., 64 x 64 + 64), multipliers with asymmetric operand sizes 
(e.g., 10 x 1024), shifted/truncated outputs (e.g., 64 x 64[95:32] returns the 
output bit positions from 32 to 95), and dot product (e.g., 8(16 x 16) + 32 is an 


1 All tests are available at https: //temelmertcan.github.io/mult-experiments.html, or 
the peer-reviewed artifact is available at https: //zenodo.org/records/10048797 
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Table 1: Proof-time results with success rates for a large set of nxn-bit multipliers 
Op. Size PP Benchmarks RevSCA2 AMulet1 VeSCMul 


32x32 sp 48 0.5s (77%) 0.4s (100%) 0.5s (100%) 
r2 48 0.8s (62%) 1.4s (100%) 0.7s (100%) 
r4 48 1.4s (87%) 1.3s (91%) 0.6s (100%) 
r8 48 241s (44%) TO (0%) 0.7s (100%) 
r16 48 TO (0%) TO (0%) 1.9s (100%) 
64x64 sp 54 11s (77%) 1.9s (100%) 1.7s (100%) 
r2 48 17s (62%) 32s (100%) 2.6s (100%) 
r4 240 19s (75%) 4.9s (88%) 2.88 (90%) 
r8 48 1630s (19%) TO (0%) 2.7s (100%) 
r16 48 TO (0%) TO (0%) 8s (100%) 
128x128 sp 54 83s (65%) 11s (10096) 6.6s (100%) 
r2 48 356s (52%) 928s (100%) 10.1s (100%) 
r4 48 642s (50%) 274s (91%) 8.4s (100%) 
r8 48 TO (0%) TO (0%) 11s (100%) 
rl16 48 TO (0%) TO (0%) 37s (100%) 
256x256 sp 48 2501s (65%) 82s (100%) 27s (100%) 
r4 48 TO (0%) 9529s (91%) 33s (100%) 
512x512 r4 6 TO (0%) TO (0%) 138s (100%) 
1024x1024 r4 6 TO (0%) TO (0%) 776s (100%) 


Multiplier sizes range from 32x32 to 1024x1024, grouped wrt. partial product algorithm. 
Total of 1032 different benchmarks are used and the timing results of successful proof 
attempts are averaged. The tools could not verify all the benchmarks and the success 
ratios are given in parentheses. VeSCMul is used only for fully automatic verification 
(without a SAT solver), but it can verify the missing cases with user-provided hints. 
Time-out (TO) is set to 3600 seconds (1 hour) for up to 128x128; 16200 seconds (4.5 
hours) for the rest. Collected on Intel® E-2378G CPU, 32GB memory. 


Table 2: Proof-time and memory allocation for various designs 


Arch. Function Time, Mem Arch. Function Time, Mem 
dt-ub4-bcla 64x64 2.1s, 0.3aB 4:2-ub4-cla 64x64 9.7s, 0.7GB 
ar-sb4-csel 64x64 2.0s, 0.3GB rbat-sb4-lf 64x64 2.4s, 0.3aB 
bdt-sb4-csk 64x64 2.4s, 0.3aB || os-sb4-cond 64x64 1.8s, 0.3GB 

dt-ssp-bk 64x64 1.7s, 0.3aB ar-usp-rp 64x64 1.0s, 0.2aB 
4:2-ub4-ks 64x64 3.7s, 0.5GB 4:2-ub8-lf 64x64 3.4s, 0.5GB 
dt-sb16-hc 64x64 8.0s, 1.9GB || wt-ub16-bk 64x64 8.3s, 1.9GB 


dt-ssp-bk 128x128 5.9s, 1.0cB || 4:2-ub4-hc 128x128 13.3s, l.8GB 
wt-usp-lf ^ 256x256 28s, 44aGB || dt-sb4-jsk 256x256 27s, 44aB 
dt-sb4-jsk 512x512 130s, 19cB || dt-sb4-jsk 1024x1024 725s, 83aB 
dt-sb4-ks 10x1024 32s, 5.laB || dt-sb4-ks 1024x10 32s, 5.7GB 
dt-sb2-bk ^ 64x64-.64 2.5s,0.4cB || wt-sb4-lf ^ 64x64[63:0] ^ 0.9s, 0.2GB 
wt-sb4-lf  64x64[95:32]  1.8s, 0.3cB || wt-sb4-If 64x64[127:64] 2.2s, 0.4cB 
wt-sb8-bk 8(16x16)+32 2.0s,0.3cB || dt-sb4-ks 4(32x64)+128 5.2s, l.laB 
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8-point dot product with 16-bit operands accumulating onto a 32-bit number). 
Comparable verification tools do not support these configurations. VeSCMul can 
fully automatically verify these designs without user hints or SAT solvers. 

Moreover, around 7500 different multiplier designs with diverse architectures, 
operand sizes, operations, truncation, and shifting were randomly generated [23]. 
Overall, VeSCMul achieved a 98% success rate for fully automatic verification 
without hints or SAT solvers. The remaining 296 is mostly made up of multipliers 
with a special 7-to-3 compressor tree, and shifted multipliers, but they could still 
be verified by VeSCMul with a user-provided design hierarchy hint. 

VeSCMul has also proven successful in industrial designs, particularly for 
Intel x86 instructions with various functional configurations, including multiply- 
accumulate, dot product, output shifting/truncation, flag calculations based on 
multiplication results, and saturation. In some cases, the assistance of a SAT 
solver becomes necessary (for flags and saturation). These designs can be fully 
verified rapidly and automatically, with results similar to those in the public 
designs. To the best of the author's knowledge, VeSCMul is the first tool to 
achieve comparable verification tasks scalably and automatically. 

Additionally, VeSCMul has played a vital role in the verification flow of 
floating-point multiply and fused-multiply-add operations. Verifying these de- 
signs is notably challenging, with no known fully automated verification method. 
We employ decomposition techniques [5J17], where VeSCMul is used for the mul- 
tiplication part, significantly reducing manual effort. Complete verification of 
single and double precision operations can be completed in under an hour. 


6 Related Work and Conclusion 


AMulet [8], RevSCA2 [I2], and DyPoSUB [14] are other state-of-the-art tools 
for multiplier verification. Like VeSCMul, AMulet prioritizes soundness and can 
produce proof certificates. In contrast, RevSCA2 and DyPoSUB lack such proofs 
or mechanisms, and DyPoSUB has been identified as unsound [9]. Additionally, 
these tools primarily focus on verifying n x n-bit multipliers with 2n-bit results. 
On the other hand, VeSCMul stands out by offering scalable and automatic 
verification for a broader range of multiplier-centric arithmetic circuits, and it 
allows users to specify their conjectures. These target designs can encompass reg- 
ular multipliers, multiply-add operations, dot products, and operations involving 
shifting, truncation, accumulation, and saturation. 

This paper has showcased VeSCMul for multiplier verification, which has 
demonstrated favorable results in experiments involving both public and propri- 
etary RTL designs. This tool is open-source and compatible with other hardware 
verification tools. It has an improved user-interface tailored for ACL2 novices. 
The tool itself is fully verified, so users can have a high level of confidence in its 
soundness. Future work includes adding support for more input formats (cur- 
rently limited to System Verilog) such as AIGER and DIMACS CNF, and further 
enhancements in automation to handle corner-case designs that currently require 
user hints for verification. 
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Abstract. We present a sound and complete axiomatization of finite 
words using matching logic. A unique feature of our axiomatization is 
that it gives a shallow embedding of regular expressions into matching 
logic, and a logical representation of finite automata. The semantics 
of both expressions and automata are precisely captured as matching 
logic formulae that evaluate to the corresponding language. Regular 
expressions are matching logic formulae as is, while the embedding of 
automata is a structural analog—computational aspects of automata are 
captured as syntactic features. We demonstrate that our axiomatization 
is sound and complete by showing that runs of Brzozowski’s procedure for 
equivalence checking correspond to matching logic proofs. We propose this 
as a general methodology for producing machine-checkable formal proofs, 
enabled by capturing structural analogs of computational artifacts in logic. 
The proofs produced can be efficiently checked by the Metamath Zero 
verifier. Work presented in this paper contributes to the general scheme of 
achieving verifiable computing via logical methods, where computations 
are reduced to logical reasoning, encoded as machine-checkable proof 
objects, and checked by a trusted proof checker. 


1 Motivation 


Regular expressions are a powerful lens for studying the description, classification, 
and implementation of regular languages [14]. A typical presentation of the syntax 
of extended regular expressions (ERE) over a finite alphabet A is as follows: 


a:=Q|el|aeAlaz-ag|a,+az|a* | 7a 


where e is the empty word, o4 - œz is concatenation, o1 + a is alternation (aka 
choice; sum; union), and a* is the Kleene star. Given a regular expression a, 
L(a) is the set of finite words that match a. 


A second lens, of finite automata, allows us to view these languages from a 
computational perspective. [14] and [27] show that a language is regular if and 
only if it is accepted by a finite automaton. Besides providing deeper insight into 
the study of languages, this dual viewpoint has practical importance—some tasks 
are easier to tackle when viewed under one lens than another. For example, in 
the implementation of a parser, it is easier to express the desired language as an 
expression, whereas an automaton may be used to recognize that language. Model 
checking [13], and runtime monitoring [3] also exploit these dual perspectives. 


© The Author(s) 2024 
B. Finkbeiner and L. Kovacs (Eds.): TACAS 2024, LNCS 14570, pp. 350—369, 2024. 
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Much research has been carried out in the logical aspects of regular expressions, 
and the computational aspects of finite automata. For example, [23] gives an 
axiomatization of regular expressions in terms of eleven axioms and two inference 
rules, while automata are used extensively to study complexity theory [13]. 


In this paper, we instead study logical aspects of automata. We present a new 
axiomatization of finite words using matching logic [8]. This axiomatization 
gives us a shallow embedding of regular expressions into matching logic where 
expressions are matching logic formulae as is. Uniquely, we can also represent 
automata as logical formulae. These formulae are a structural analog of the 
automaton—computational aspects such as non-determinism and cycles are 
captured using syntactic constructs such as logical disjunction and fixpoint 
operators. We will compare our shallow embedding with prior work using second- 
order logic, and other formalizations and axiomatizations in Section 2. 


Based on our axiomatization, we propose a general technique for generating 
machine checkable proofs of algorithms that manipulate finite automata. We 
show that this technique is practical by generating proofs of equivalence between 
regular expressions from a derivative of Brzozowski's method [4], producing 
concrete proofs in matching logic's proof system realized in Metamath Zero [6]. 
As touched on in Section 7, an extension to this work may produce proofs for a 
symbolic execution based compiler [25] allowing us to trust its correctness. 


Work presented here contributes to the scheme of verifiable computing [2] via 
logical methods: computations are reduced to logical reasoning, encoded as 
machine-checkable proofs, and checked by a small trusted checker, thus reducing 
our trust-base to the checker while avoiding the expense of full formal verification. 


The rest of the paper is organized as follows: 


— Section 2 briefly describes prior work in relation to our work. 

— Section 3 reviews regular expressions, automata and related concepts. 

— Section 4 introduces matching logic and presents a model of finite words. 

— Section 5 shows how we may axiomatize this model, and prove equivalent 
regular expressions and automata. 

— Section 6 gives a brief description of our implementation. 

— Section 7 lays out some future avenues for research. 


Detailed proofs may be found in the companion technical report [21]. 


2 Related Work 


Monadic Second-order-logic (MSO) over Words There is a well-known 
connection between MSO and regular languages. Biichi, Elgot, and Trakhtenbrot 
showed that MSO formulae and regular expressions are equally expressive [5, 11, 
28]. Moreover, the transformation from expressions to formulae and back is easily 
computable [26]. Models are sets of labeled positions, representing a word. The 
set of models that satisfy a formula give us its language—e.g. the MSO formula 
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L defines the empty language—no word satisfies it, while 3x. Px ^A Vy. = y 
defines the language containing the word a. Here, P,x indicates the letter at 
position x is a. The concatenation of languages may be defined as: 


X. Vy, z. (YE X Az Z X) À2u«2)^leclsex ^ leslsex 


Here, [v];(;; denotes the relativization of the formula y to the formula v, a 
transformation that forces it to apply to a particular subdomain of the model. 
'The translation of Kleene star is even more complex. This connection has been 
used, e.g. in the verification of MSO formulae [29]. 


One concern about this connection between MSO and regular expressions is that 
the translation of expressions is quite involved, including complex auxiliary clauses 
and quantification, as well as the relativization transformation. Our goal here is 
to define a shallow embedding, rather than a translation—regular expressions 
are directly embedded as matching logic with minimal representational distance. 


Salomaa's Axiomatization In [23] Salomaa provides a complete axioma- 
tization of regular expressions that may be used to prove equivalences. This 
axiomatization is specific to unextended regular expressions and does not support 
other representations such as negations in EREs, and finite automata. 


Deep Embeddings of Automata and Languages There are several existing 
formalizations of regular expressions and automata using mechanical theorem 
provers, such as Coq [10] and Isabelle [16]. To the best of our knowledge, all 
these formalizations use deep embeddings. In [10], the authors formalize regular 
expressions and Brzozowski derivatives, with the denotations of regular expres- 
sions defined using a membership predicate. Besides proving the soundness of 
Brzozowski's method, the authors also prove that the process of taking derivatives 
terminates through a notion of finiteness called inductively finite sets. This is 
something that is not likely provable in a shallow embedding like ours. 


Fixpoint Reasoning in Matching Logic We consider our work an extension 
of the work in [9], where the authors begin tackling the problem of fixpoint 
reasoning in matching logic. Their goal was to use matching logic as unified 
framework for fixpoint reasoning across domains. Using a small set of derived 
matching logic inference rules, they proved various results in LTL, reachability, 
and separation logic with inductive definitions. We employ many of the techniques 
first described there, but in addition deal with more complex inductive proofs 
and recursion schemes, besides producing formal proof certificates. 


3  Preliminaries 


3.1 Languages, Automata, and Expressions 


A language is a set of finite sequences over letters of an alphabet. ERE and finite 
automata are two ways to represent a class of languages called regular languages. 
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Definition 1. Let A = (a4,a5,...a4]) be a finite alphabet. Then ERE over the 
alphabet A are defined using the following grammar: 


a:=OlelacAla-alata|a*|n7a 
The language that an ERE represents, denoted L(a) is defined inductively: 
£( —-0 Lle)={e} Lla) = {a} 
L(aı + 02) = £(01) U £(o2) 
-w2 |ui € £(o1) and wz € L(az)} 


{wi 
= Uo L(a”) where a? = e, anda" =a-a 
A* 


\ (o) 


L(a: a2 


) 
)= 
) 
) 


Since EREs include both complement and choice, other operators like intersection, 
subsumption and equivalence are definable as notation. We denote these as 
«^B z -(a4- 38), a> B = a+ B, and a & B = (a— B) ^(B — a) respectively. 


Definition 2. A non-deterministic finite automaton (NFA) is a tuple Q = 
(Q, A, ô, qo, F), where 


— Q is a finite set of states, 

— A is a finite set of input symbols called the alphabet, 
— 6:Qx A — P(Q) is a transition function, 

— qo € Q is the initial state, and 

— F C Q is the set of accepting states. 


If range(0) has only singleton sets, Q is a deterministic finite automaton (DFA). 


3.2 Brzozowski’s Method 


In [4], Brzozowski introduced an operation over languages called its derivative, 
denoted 6,(a). This operation “consumes” a prefix from each word in the language: 


Definition 3. Given a language L and a word s, the derivative of L with respect 
to s is denoted by 6,(L) and is defined as (t| s-t € L}. 


For ERES, it turns out that the derivative can also be defined syntactically, as a 
recursive function, through the following equalities: 


dale) — 0 falar + 02) = 0«(01) + ala1) 

Sal) — 0 falar + 02) = da(a1) + a2 + arle: da(az) 
ala) = € fala*) = ða (a) - a* 

ó,(b —0 ifa#b. alma) = 76,(a) 

5-(a) =a awla) = bw(da(a)) 


Here, a |. is e if the language of a contains e and @ otherwise. There are two 
properties of derivatives that are important to us. First, every ERE may be 
transformed into an equivalent one partitioning its language per the initial letter: 
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Theorem 1 (Brzozowski Theorem 4.4). Every ERE a can be expressed as: 


a — a[. - Xa (o) 


acA 
Second, repeatedly taking the derivative converges: 


Theorem 2 (Brzozowski Theorem 5.2). Two EREs are similar iff they are 
identical modulo associativity, commutativity and idempotency of the + operator. 
Every ERE has only a finite number of dissimilar derivatives. 


These two properties give rise to an algorithm for converting an ERE into a DFA, 
illustrated in Figure 1. The automaton is constructed starting from the root node, 
identifying each node with an ERE. The root node is identified with the original 
ERE. Every node has transitions for each input letter to the node identified 
by the derivative. A state is accepting if its language contains the empty word, 
a property easily checked as a syntactic function of the identifying ERE. This 
process must terminate by Theorem 2, giving us a DFA. We can check if the 
ERE is valid by simply checking that all states are accepting. 


4 Matching Logic and the Standard Model of Words 


In this section, we will review the syntax and semantics of matching logic and 
present a matching logic model W of finite words. We show how it may be used 
to embed both EREs and finite automata. Matching logic, originally proposed 
in [22], was revised in [8] to include a fixpoint operator. We present a variant, 
called polyadic matching logic, omitting sorts since we do not need them!. 


4.1 An Overview of Matching Logic 


Matching logic has three parts—a syntax of formulae, also called patterns; a 
semantics, defining a satisfaction relation F; and a Hilbert-style proof system, 
defining a provablility relation F. We will only go over the first two, and then 
return to matching logic’s proof system in the Section 5. 


Syntax Matching logic formulae, or patterns, are built from propositional 
operators, symbol applications, variables, quantifiers, and a fixpoint binder. 


Definition 4. Let EVar, SVar, X be disjoint countable sets. Here, EVar contains 
element variables, SVar contains set variables and the signature X = {Xn} is an 
arity-indexed set of symbols. A X-pattern over X is defined by the grammar: 


P = OPi; Pn) | We | 41 V G2 | G1 = p2 | p1 E p2 | x | Ix. e| X | uX. p 
1 Tt has since been observed that sorts may be defined axiomatically, and it is unneces- 


sary to build them into the logic. It is called polyadic to distinguish it from applicative 
matching logic with only nullary symbols but includes an explicit application operator. 
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Note that we have assumed more operators than necessary—equality and subset 
may be defined in terms of the remaining operators. Please refer to [8] for details. 
We assume the usual notation for operators such as T, V, ^, V, v etc. Here, v is 
the greatest fixpoint operator, defined as v X. = ^uX.-v[^X/X]. 


Semantics: An Informal Overview Matching logic formulae have a pattern 
matching semantics. Each pattern y matches a set of elements |p| in the model, 
called its interpretation. As an example, consider the naturals N as a model 
with symbols zero and succ. Here, the pattern T matches every natural, whereas 
succ(r) matches x 4- 1. Conjunctions and disjunctions behave as intersections 
and unions—the q V w matches every pattern that either y or v» match. 


Unlike first-order logic, matching logic makes no distinction between terms and 
formulae. We may write succ(r V y) to match both z + 1 and y + 1. While 
unintuitive at first, this syntactic flexibility allows us to shallowly embed varied 
and diverse logics in matching logic with ease. Examples include first-order logic, 
temporal logics, separation logic, and many more [8, 7]. Formulae are embedded 
as patterns with little to no representational distance, quite often verbatim. 


Patterns aren't two valued as in first-order logic. We can restore the classic 
semantics by using the set M to indicate “true” and () for “false”. The operators 
= and C are predicate patterns—they are either true or false. For example, 
x C succ(T) matches every natural if x is non-zero, and no element otherwise. 
This allows us to build constrained patterns of the form Ystructure ^ Qconstraints- 
Here, structure defines the structure, while (constraints places logical constraints 
on matched elements. For example, the pattern x ^ (a C succ(T)) matches z, 
but only if it is the successor of some element—i.e. non-zero. 


Existential quantification works just as in first-order logic when working over 
predicate patterns. Over more general patterns, it behaves as the union over a set 
comprehension. For example, the pattern 3x. x ^ (a C succ(T)) matches every 
non-zero natural. Finally, the fixpoint operator allows us to inductively build 
sets, as in algebraic datatypes or inductive functions. For example, the pattern 
uX. zero V succ(succ(.X)) defines the set of even numbers. 


Semantics: A Formal Treatment We will now formally define the semantics 
of matching logic. In the interest of brevity we keep things concise. For a more 
detailed treatment please refer to [8]. Matching logic patterns are interpreted in 
a model, consisting of a nonempty set M of elements called the universe, and an 
interpretation om : M” — P(M) for each n-ary symbol o € X. 


Definition 5 (Matching logic semantics). An M-valuation is a function 
EVarUSVar — P(M), such that each x € EVar evaluates to a singleton. For a 
model M and an M -valuation p, the interpretation of patterns is defined as: 
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zl, = ple), |X lhe = P(X) looi. Onde = Jom (Gr, -s Gn) 
e M =M\ lel ai€|eil, 
= NET p 
|3a. p S = U [TE [p1 V v» M leila U pal ty 
ac M InX. plu = lfp {4 > leluotax1) 
s p p 7 
ler C esl, = M if eil € leal eee AA Mif ley, = lyol; 
i O otherwise l "M ( otherwise 


For the most part, this definition is as expected. For the predicate patterns, the 
corresponding patterns evaluate to M if they hold, otherwise to the empty set. 
Besides these, patterns have the obvious evaluation—set and element variables 
are evaluated according to p; logical operators are evaluated as the corresponding 
set operation; symbols as defined by the model; existentials as the union for x 
ranging over M; and p as the fixpoint of the interpretation of the pattern. 


4.2 A Model of Finite Words 


Let us introduce a model W as the standard model of finite words. Define 
signature Xwo,q containing constants e and a for each a € A, and a binary 
symbol concat for concatenation. This model allows us to describe languages, 
including those of regular expressions and finite automata as patterns. 


Definition 6. Let W be a model for the signature Swora with universe the set 
of finite sequences over alphabet A, and the following interpretations of symbols: 


= ew = {0}, 


— for each letter a, aw := {(a)}, and 
— concatw(s1, $2) := (s1- S2}. 


Patterns interpreted in model W define languages. e is interpreted as the singleton 
set containing the zero-length word, each letter as the singleton set containing the 
corresponding single-letter sequence, and finally, concat as the function mapping 
each pair of input words to the singleton containing their concatenation. 


We may define the empty language simply as L. The concatenation of two 
patterns gives the concatenation of their languages. Matching logic’s disjunction 
allows us to take the union for any languages, while negation gives us the 
complement. Finally, we may define the Kleene closure of a language using the 
fixpoint operator—4.X.e V y: X gives us the Kleene closure of the language of q. 


A Shallow Embedding of Extended Regular Expressions It is easy to 
define regular expressions as patterns, once we have the following notation: 


0L (ptP=epvy  wv'zuX.ev(e- X) 


Any ERE taken verbatim is interpreted in model W as its language. 
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L: (1) | 
pat(@) : a- pat(Q) v b- pat(@) 


T L: (2) 
pat(Q)) : HY. e V a- pat(@) V b - pat(®) 


e 4 Y 
: | L: o 

92 | pat(@) : e V a- pat(2)) v b- "s 
a,b @) L: (2) 
pat(2)) = Y 


L: e 
pat(8) : X 


Fig.1: A DFA Q for the ERE (aa)* — a*a, and its corresponding unfolding 
tree. Each node n shows its label L(n), and the pattern pat(n). Here pat(@) = 
LX. c V a- pat(G5) V b- pat(@)). The pattern for the automaton, pato, is that 
of the root node pat(@). Observe that its structure closely mirrors that of Q. 
Accepting nodes include e as a disjunct, whereas others do not. Starting a cycle 
in the graph introduces a fixpoint binder, whereas completing one employs the 
bound variable corresponding to that cycle. The major structural differences are 
due to duplicate states to allow backlinks and nodes reachable via muliple paths. 


Contrast this to the MSO translation of concatenation, shown in Section 2, and 
especially of Kleene star. 


Theorem 3. Let a be an ERE. Then L(a) = |alw 


4.3 Embedding Automata 


While it is obvious how to embed expressions the representation of automata, be- 
ing computational rather than logical, is less clear. Here, we define a pattern patg 
whose interpretation is the language of a finite automaton Q, either deterministic 
or non-deterministic. Crucially, this pattern captures not just the language of 
the automaton (in Section 2 we mentioned that it is possible to do this in MSO 
as well), but also its structure—as shown in Table 1, structural elements of the 
automata map to syntactic elements of the pattern—non-determinism maps to 
logical disjunctions; cycles map to fixpoints. This allows us to represent transfor- 
mations of automata, such as making a transition, union, or complementation, 
as logical manipulations of this pattern in a proof system. This is imperative to 
capturing the execution of an algorithm employing these in a formal proof. To 
define pato, we must first define the unfolding tree of the automaton Q. 


Definition 7. For a finite automaton Q = (Q, A,6,qo, F), its unfolding tree 
is a labeled tree (N, E, L) where N is the set of nodes, E C Ax Nx N isa 
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Computational aspect of Q Syntactic aspect of pato 

Node n is accepting € is a subclause of pat(n) 
Non-determinism, union of FAs Logical union 

Graph cycles Fixpoint binder and its bound variable 
Changing the initial node Unfolding, framing 


Table 1: Structural aspects of O become syntactic aspects of pato. This is crucial 
to capturing the traces of algorithms as proofs. 


labeled edge relation, and L : N — Q is a labeling function. It is the tree defined 
inductively: 


— the root node has label qo, 
— if a node n has label q with no ancestors also labeled q, then for each a € A 
and q' € ó(q,a), there is a node n' € N with L(n’) = q', and (a,n,n') € Ea. 


When Q is a DFA, we use n, to denote the unique child of node n along edge a. 
All leaves in this tree are labeled by states that complete a cycle in the automaton. 
We define a secondary labeling function, pat : N — Pattern over this tree. 


Definition 8. Let (N, E, L) be an unfolding tree for Q = (Q, A,0,qo, F). Let 
X : Q — SVar be an injective function. Then, we define pat recursively as follows: 
1. For a leaf node n, pat(n) := X(L(n)). 
2. For a non-leaf node, 
a. if n doesn't have a descendant with the same label, then: 


eV Vv a-pat(n’) if L(n) is accepting. 
(a,n,n')EE 
V a-pat(n') otherwise. 
(a,n,n')EE 


pat(n) = 


b. ifn has a descendant with the same label, then: 


pX(L(n)).e V V a-pat(n’) if L(n) is accepting. 
(a,n,n')EE 

uX (L(n)). V a-pat(n’) otherwise. 
(a,n,n')EE 


pat(n) = 


Finally, define patg :— pat(R), where R is the root of this tree. 


For nodes of the form (2b), we “name” them by binding the variable X (L(n)) 
using the fixpoint operator. When we return to that state we use the bound 
variable to complete a cycle. The use of fixpoints allows us to clearly embody the 
inductive structure as a pattern. Figure 1 shows an example of unfolding tree. 
The following theorem shows that this representation of automata is as expected. 


Theorem 4. Let Q be a finite automaton. Then £(Q) = |patg|w 
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4.4 Embedding Brzozowski’s Derivative 


Besides regular languages, other important constructs may be defined using this 
model. Let us look at derivatives, needed to capture Brzozowski's method as a 
proof. The Brzozowski derivative of a language L w.r.t. a word w, is the set of 
words obtainable from a word in L by removing the prefix w. Defining this is 
quite simple in matching logic—for any word w and pattern Y, we may define its 
Brzozowski derivative as the pattern ó,,(V) = 3e. x ^ (w.x Cy). 


'This definition is quite interesting because it closely parallels the embedding of 
separation logic's magic wand in matching logic: p — V = 3a. z A (qx Cw). At 
first glance, this seems like a somewhat weak connection, but on closer inspection, 
magic wand and derivatives are semantically quite similar—we may think of 
magic wand as taking the derivative of one heap with respect to the other. 


It is these connections between seemingly disparate areas of program verification 
that matching logic seeks to bring to the foreground. In fact, both derivates and 
magic wand generalize to a matching logic operator called contextual implication: 
C —o v zd ^ (C[O] € v) for any pattern v? and application context C [9]. 
Using this notion. derivatives and magic wand become ó,(y) = w: O — » 
and o — aj = ọ * O — qv respectively. This operator has proven key to many 
techniques for fixpoint reasoning in matching logic, especially the derived rules 
(WRAP) and (UNWRAP) that enable applying Park induction within contexts [9]: 


(UNWRAP) 
ARI an pei ce pre 


Cle] 2 v Fe (C — y) 


(WRAP) 


5 Proof Generation 


In the previous section, we showed how we may capture languages as matching 
logic patterns. Specifically, automata are captured as patterns that are structural 
analogs. In this section, we will demonstrate how we capture runs of algorithms 
that manipulate automata as proofs. In particular, we capture runs of Brzozowski's 
method using matching logic's Hilbert style proof system. 


'This technique is only possible because of the structural similarity between an au- 
tomata Q, and its pattern pato. It gives us the ability to represent computational 
transformations on automata as logical transformations of these patterns using 
matching logic’s proof system. This section focuses on the theory and proofs 
involved. The subsequent section, Section 6, will present our concrete implemen- 
tation producing matching logic proofs that can be checked using Metamath 
Zero. Let us first introduce matching logic’s proof system, and a theory I wora 
within which we do our reasoning. 


5.1 Matching Logic’s Proof System 


The third component to matching logic is its proof system, shown in Figure 2. It 
defines the provability relation, written I' F p, meaning that y can be proved 
using the proof system using the theory /' as additional axioms. 
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(PRoPos. 1 p(w 9) (PRoPAG,) C[L]—> L 
(Proros. 2) (y> (v0) (PRoPAGy) Cle vy] > Ciel v Ci] 
>((y —w)— (o0) (PRroPAGa) Cz. y] > Ar. Cu] 
(PRoPos. 3) ((p—1)—1)—9 where z é FV(C) 
o yy pow 
(MP) = (FRAMING) Cis > Cw) 
(3-QuaNT.) e[y/z] > Ix. EN ery 
2 i SEN) (3x. p) — v 


where z ¢ FV (4) 


(PRE-FP)  [(uX.y)/X] > uX. my E v/X] 3 v 
(uX.p) c v 
(EXISTENCE)  dx.r (SuBsT) TTE 


(SINGLETON) | —(Ci[x ^ y] ^ Co|v ^ ^g]) 


Fig. 2: Matching logic proof system. Here C, C, C2 are application contexts, a 
pattern in which a distinguished element variable [] occurs exactly once, and 
only under applications. We use the notation Cly] = C [v/LI]. 


These proof rules fall into four categories. First, the FOL rules provide complete 
FOL and propositional reasoning. The (PROPAGATION) rules allow applications 
to commute through constructs with a *union" semantics, such as disjunction 
and existentials. The proof rule (KNASTER-TARSKI) is an embodiement of the 
Knaster-Tarski fixpoint theorem [24], and together with (PREFIXEDPOINT) corre- 
spond to the Park induction rules of modal logic [18, 15]. Finally, (EXISTENCE), 
(SINGLETON), and (SUBST) are technical rules, needed to work with variables. 


5.2 A Theory of Finite Words 


We may use a theory I’, a set of patterns called axioms, to restrict the models we 
consider to those in which every axiom is “true”. We say a pattern ọ holds in a 
model M, or that y is valid in M, written M F q if its interpretation is M under 
all evaluations. For a theory I’, we write M F I if every axiom in I is valid in 
M. For a pattern «v, we write I’ E v if for every model where M F I’ we have 
M Fw. These axioms also extend the provability relation | defined by the proof 
system, allowing us to proof additional theorems. The soundness of matching 
logic guarantees that each proved theorem holds in every model of the theory. 


Figure 3 defines a theory, I wora, of finite words. The first set of the axioms 
in I wora, (FUNG,), gives each symbol a functional interpretation: for an n-ary 
symbol c, the axiom Vz1,...,z4.3y.0(21,..., £n) = y, forces the interpretation 
om to return a single output for any input. This is because element variables are 
always interpreted as singleton sets. Next, the (NO-CONF) axioms ensure that 
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Signature: e, - ,anda for each a € A. 
Axioms: 
For each a € A, For each distinct a,b € A, 
Jw.a = w (FUNCa) azb (NO-CONF,) 
dw.e=w (FUNC.) eZavb (NO-CONF,) 
Vu,v.dw.u:v =w (FUNC.) Vu, v.e = u: v > 
Yu, v, w. (u-v) -w =u- (v: w) u-c^u-c (NO-CONF,-1) 
(assoc) Yz, y : Letter.Vu, v. 
Vz. (e-z) = 2 (Dz) LuU=yv>L=yAu=v (NO-CONF,-2) 
Vz. (x - €) = (IDR) X.V Vaca X (DOMAIN) 


Fig. 3: I'wora: A theory of finite words in matching logic. This theory is complete 
for proving equivalence between representations of both automata and extended 
regular expressions. Here, Vg : Letter. y is notation for Yx. x € (Vac A a) > Yy. 


interpretations of symbols are injective modulo AU—they have distinct interpre- 
tations unless their arguments are equal modulo associativity of concatenation 
with unit e. Here, Vx : Letter. y is notation for Vr.z € (V,c4a) > o, i.e. we 
quantify over letters. The axioms (ASSOC), and (IDr), and (IDR) enforce the 
corresponding properties and allow their use in proofs. The final axiom (DOMAIN) 
defines our domain to be inductively constructed from e, concatenation and 
letters. It is easy to see the standard model W satisfies these axioms, giving us 
the theorem, proved in the appendix [21]: 


Theorem 5. WE Iwora 


The rest of this section is dedicated to showing that I wora is complete with respect 
to both equivalence of automata and EREs—if two automata or expressions have 
the same language their representations are provably equivalent. 


5.3 Proving Equivalence between EREs 


We are now ready to demonstrate our proof generation method. We will use 
it to capture equivalence of expressions using matching logic's Hilbert-style 
proof system. Brzozowski's method consists of two parts—converting an ERE 
into a DFA Q, and checking that Q is total. Mirroring this, the proof for 
equivalence between EREs Iworad F a € 8 has two parts. First, we prove that 
Iwora F pato — (a + 8)—the language of a + 8 subsumes that of Q. Second, 
that Iwora F pato —the language of Q is total. We put these together using 
(MODUS-PONENS), giving us I wora F a € f—the EREs are provably equivalent. 


Proving l'wora F pato — (a + 8) To prove this, we prove a more general, 
inductive, lemma: 


362 N. Rodrigues et al. 


Lemma 1. Let n be a node in the unfolding tree of the DFA Q of the regular 
expression a + B, where a and 8 have the same language. Then, 


Tr pat(n)[A,] — Ópath(n) (a e p) 
where, 
Aà, the empty substitution ifn is the root node 


An = 4 Ap[Ópam (py (e €? 8)/X(p)] ifn has parent p, and pat(p) binds X(p) 
A 


" otherwise. 


The substitution A, provides the inductive hypothesis—as we use the (KNASTER- 
TARSKI) rule on each p-binder in pato, it replaces the bound variable with the 
right-hand side of the goal. The left-hand side then becomes a disjuntion of the 
form eV a: pat(na) [As] V b- pat(ns)[An,]. We decompose the right-hand side into 
a similar structure using an important property of derivatives, proved in wora: 


Lemma 2. For any pattern p, Tworat 9 = ((€ ^q) V Vaca a 5a(¥)) 
The derivatives are reduced to expressions using proved syntactic simplifications: 


Lemma 3. For EREs a, B and distinct letters a and b, the following hold: 
= IWord In ôa (0) = 0; IWord be os (€) as 0; 


T I Word p ós (b) = (); I Word z ós (a) = €; 

ed I Word m ós (o3 + a2) = ba(Q1) + ós (o1) 

— I'Wora F Ó«(03 + a2) = ala1) + a2 + (a1 ^ €) + ba(a2); 
— Iwoa F ða (7a) E -é,(a); 

— Iwora F 6a(a*) = da (Q) - a. 


Proving l'wora F pato The next part of the proof is a bit more technical, 
requiring us to exploit the equivalence I wora F (uX.eV X-y) e (uX.evV o- X), 
and induct using the (DOMAIN) axiom. This reduces our goal to Twora F pato - 
(Vaca 4) ^ pato, a consequence of the following inductive lemma: 


Lemma 4. Let n be a node in the unfolding tree of a total DFA Q. Then, 
IWwoxa F pat(n)[O,] - (Vaca a) — pat(n) [Un] 


where, 
A, the empty substitution ifn is the root node 
On = 4 O,[U,/ X r.(y;] if n has parent p, and pat(p) binds X (p) 
Op otherwise 
”=0: (Vaca a) — pat(p)[U,] 
À, the empty substitution ifn is the root node 
Un = 4 U,[pat(p)[U5]/ X r(»;] if n has parent p, and pat(p) binds X(p) 


Us otherwise 


A Logical Treatment of Finite Automata 363 


def checkValid(y: Regex, prev: set[Regex] = Ø) — bool: 
if p € prev: return True 
if —chasEWP(o): return False 
return checkValid(canonicalize(da(y), prev U {y})) 
and checkValid(canonicalize(d,(y), prev U {p})) 


Fig.4: The algorithm instrumented to generate proofs. The canonicalize func- 
tion reduces the pattern ó4(q) to an ERE, simplifying it to a form where choice 
is left-associative, and the idemopotency and unit identities have been applied. 


Again, O,, gives us the inductive hypothesis, this time in the form of a contex- 
tual implication. To apply it, we leverage a general property about contextual 
implications: - C[C —e | > y, allowing us combine framing with Park induction. 
This gives us our main theorem, showing that our axiomatization is complete 
with respect to extended regular expressions: 


Theorem 6. For any EREs a and B with the same language, Twora F a €» D. 


5.4 From Expressions to Automata 


Our uniform treatment of automata and expressions as patterns allows us to 
apply Brzozowski's method not just to EREs but also to more general patterns. 
For example, it can be used to determinize NFAs, or take the complement, union, 
or intersection of DFAs. The general principle is the same as above, except instead 
of a B, we use a pattern corresponding to the operation we wish to perform. 
For example, to prove that the DFA Q has the same language as the intersection 
of those of A and B, we prove l'woa F pato €» (pat, ^ pata). All we need is 
the ability to take the derivative of arbitrary fixpoint patterns enabled by the 
equivalence I wora F al uX. p) © ós(e|u X. v/ X]). 


6 Implementation and Evaluation 


In this section, we describe the implementation of our method. The algorithm 
implemented is shown in Figure 4. It recursively checks that the expression 
and its derivatives have the empty-word property, keeping track of when it 
has already visited an expression. Here, ó4(q) represents a pattern using the 
derivative notation, and not the fully simplified regular expression. This notation 
is simplified away in the (also instrumented) canonicalize function that also 
normalizes the choice operator to be left-associative and commutes subterms into 
lexicographic order, allowing the application of the idemopotency and unit axioms. 
'This results in a canonnical representation of expressions modulo similarity. 


The intrumentation of successful runs of this method produces a proof-hint, an 
example of which is shown in Figure 5. A proof-hint is an informal artifact 
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(der (a +)", 

a: (simpl 6a((a+ b)"), der-*, O, a > (a -b);l a, 

simpl (ôa((a + b)) - (a + b)* der-V, O- (a 4- b), ..., 

simpl (d.(a) + 6a(b))(a+ b), der-same-letter, (O + ó«(5)) - (a +b)“, ..., 


* 


simpl (e + ôa(b))(a + b)*, der-diff-letter(«+Q)-(a+b)",..., 
simpl («+ L)(a+b)*, choice-identity-right, O- (a +b)“, ..., 
simpl e: (a + b)“, concat-identity-left, O, ..., 


backlink (a + b)*)))))))), 
b: (simpl 6,((a+)*),...)) 


Fig.5: An snippet of a proof-hint for expression (a + b)* produced by the in- 
strumentation. Most substitutions are omitted for brevity. The lemma id der-* 
corresponds to the metamath theorem for Mwora F ói(o*) = ój(a) - a* 


containing all the information necessary to produce a formal proof. It is a term 
defined by the following grammar. 


Node := (backlink Pattern) 
| (der Pattern, a : Node, b : Node) 
| (simp1 Pattern, LemmalID, Context, Subst, Node) 


These terms are more detailed structures than unfolding trees—if we ignore the 
simplification nodes, we get an unfolding tree. Each backlink and der node is 
labeled by a regular expression, and correspond to the leaf and interior nodes of 
an unfolding tree. In addition, der nodes have child nodes labeled by the patterns 
da(y) and d,(y). Note that these are patterns and not regular expressions—they 
use the matching logic notation for derivative, and are distinct from the fully 
simplified EREs. Each simp1 node keeps track of equational simplifications needed 
to reduce the derivative notation, and employs associativity, commutativity, and 
idempotency of choice to reduce the expression into a canonical form, allowing 
the construction of unfolding tree to terminate. The simpl nodes contain the 
name of the simplification applied, the context in which it was applied, as well 
as the substitutions with which it was applied. The LemmalID corresponds to a 
hand-proved lemma in the Metamath Zero formalization. 


To produce the proof of validity, proof-hints are used in three contexts. First, 
to produce the pattern pato; next, to produce an instance of Lemma 1; and 
finally, to produce an instance of Lemma 4. For each lemma, we inductively 
build up the proof from two manually proven Metamath Zero theorems, one 
for the backlink node case, and another for the der node case. In the case of 
Lemma 1, the simpl nodes are ignored. In the case of Lemma 4 we use them to 
reduce the patterns to their canonical form. This is done by lifting a manually 
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Benchmark Nodes .mmb size Gen. time Check time 
Manual Lemmas 307 3 
(a+ b)* 3 2 64 3 
a** > a* 5 4 82 3 
(aa)* > a*a+e 9 15 179 3 
-(T-a- T) 4 -(b*) 5 5 90 3 
match;(2) / match;(8) 19 /43 13/266 273 / 27483 3/4 
match, (2) / match, (8) 19 /43 13 / 228 337 / 21085 3/4 
eq,(2) / eq,(8) 13 /37 15 / 446 374 / 91661 3/5 
eq,.(2) / eq,.(8) 13 /37 15 / 330 368 / 31489 3/5 


Table 2: Statistics for certificate generation. Sizes are in KiB, times in milliseconds. 
We show the unfolding tree nodes, proof size, generation and checking time. 


proven theorem corresponding to the Lemmald into the context, and applying 
the substitution, all supplied by the simpl node. 


Trust Base Our trust base consists of the Metamath Zero formalization of 
matching logic proof system, including its syntax and meta-operations for its 
sound application such as substitution, freshness (272 lines); the theory of words 
instantiated with A = {a,b}, (13 lines); and the Metamath Zero proof checker, 
mmO-c. Each of these are defined in .mm0 files in our repository [20, 19]. From these, 
we prove by hand 354 supporting general theorems and 163 specific to I wora, 
such as Lemmas 1 and 4, and those about derivatives and their simplification. 


Evaluation We have evaluated our work against handcrafted tests, as well as 
standard benchmarks for deciding equivalence presented in [17]. Some statistics 
are shown in Table 2. Each match,;,;(n) test, by [12], is an ERE asserting 
that a” matches (a + €) - a”, that is, a^ > (a+ e) - a”. Here a” indicates n-fold 
concatenation of a, with the l version using concatenation from the left, and the 
r version on the right. That is, a? may be either ((a'- a) - o) or (a: (a@-a)). Each 
eq{;,r} (n) test, by [1], checks if a* and (a°+---+a")-(a”)* are equivalent. We also 
include property testing using the Hypothesis testing framework. We randomly 
generate an ERE a, and check that a — a. Our procedure does not optimize for 
this, so it allows testing correctness for a variety of expressions, augmenting the 
few handcrafted ones, and the structurally monotonous benchmarks. 


Performance In this work, our goal was to prove that this process is feasible— 
we have not focused on performance. In fact, we find the performance numbers 
here are quite poor. There are a number of reasons for this. 


First, we made some poor implementation choices with reference to instrumenta- 
tion. The prototype uses Maude and its meta-level to produce the instrumentation. 
While Maude's search command collects all the information needed for the proof 
hint, it does not make it accessible. This forced us to repeatedly enter and exit 
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the meta-level to collect this information, bringing the running time of, e.g., 
match;(8) to 27 seconds, compared to 3ms when implemented idiomatically. 


Another reason is that we targeted simplicity, rather than even the most basic 
optimizations. For example, when multiple identical nodes occur in an unfolding 
tree, we do not reuse the subproofs for identical notes in the derivative tree, and 
instead re-prove the result each time. This causes a significant blow up in proof 
size. We believe that a relatively small engineering effort would greatly improve 
performance both in terms of proof size and generation time. 


Another issue is that handling machine generated proofs is not one of Metamath 
Zero’s design goals. It is intended as a human-readable language, for human- 
written proofs. We would rather output a succinct binary representation of proofs. 
Although Metamath Zero does allow generation of proofs directly in the mmb 
format, this seems closer to an embedded systems format than a formal language. 


7 Future Work and Conclusion 


Study of Languages Definable in I'wo,a4 While this paper has focused on 
regular languages in /wo;q, we can define more languages. For example, the 
context-free language (a? - b"|n € N} may be defined as a” - b" = pX.eVa-X-bd. 
Extending this, we may define a” - b" - c^, and a! - b^ - c" for n,i € N as the 
patterns a" - b” - c* and a* - b" - c" respectively. Finally, since patterns are closed 
under intersection we may define the context-sensitive language a” - b” . c" = 
(a? - b" - c*) ^ (a* - b" - c"). Extensive research has been done regarding languages 
definable in fragments of MSO. A corresponding effort for matching logic would 
be interesting. Likely, quantifiers and fixpoint operators will allow defining most 
computable languages. 


Application to Control Flow Graphs (CFGs) Through the K Framework, 
the transition systems of programming languages are defined in matching logic. 
The CFGs of programs in these languages may be viewed as automata. Our 
technique would allow formal proofs of correctness of algorithms over the CFGs 
of programs, such as the semantics-based compiler in [25]. 
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Abstract. Petri nets constitute a well-studied model to verify and study 
concurrent systems, among others, and computing the coverability set 
is one of the most fundamental problems about Petri nets. Using the 
proof assistant Coa, we certified the correctness and termination of the 
MiNCov algorithm by Finkel, Haddad, and Khmelnitsky (FOSSACS 
2020). This algorithm is the most recent algorithm in the literature that 
computes the minimal basis of the coverability set, a problem known to 
be prone to subtle bugs. Apart from the intrinsic interest of a computer- 
checked proof, our certification provides new insights on the MINCov 
algorithm. In particular, we introduce as an intermediate algorithm a 
small-step variant of MINCov of independent interest. 


Keywords: Petri net - Karp-Miller tree algorithm - Minimal coverabil- 
ity set - Coq - Certified decision procedure 


1 Introduction 


Petri nets constitute a well-studied model to verify and study concurrent sys- 
tems, with several applications in other domains, like in chemical [fl] and bi- 
ological process (see for additional applications). Formally, a Petri 
net is given by a finite set of places and a finite set of transitions. Each place 
is marked with a natural number that can be incremented or decremented by 
the transitions. A function that maps places to the marked numbers is called a 
marking. 'The reachability set of a Petri net from an initial marking is the set of 
markings that can be obtained by executing a sequence of transitions from the 
initial marking. 


'The central problem about Petri nets is the reachability problem that consists 
in deciding whether a final marking is in the reachability set. Many important 
computational problems in logic and complexity reduce or are even equivalent 
to this problem [I5/31]. The reachability problem is known to be Ackermann- 
complete [5|23]6[20]. On positive instances, it can be decided with efficient di- 
rected exploration strategies [B], but general complete algorithms deciding the 
problem are complex [24], and require a lot of implementation efforts [7]. 


'This high complexity is not always a barrier in practice since many problems 
related to Petri nets can be decided by introducing an over-approximation of the 
(9 The Author(s) 2024 


B. Finkbeiner and L. Kovács (Eds.): TACAS 2024, LNCS 14570, pp. 370—389, 2024. 
https://doi.org/10.1007/978-3-031-57246-3 21 


A State-of-the-Art Karp-Miller Algorithm Certified in Coq 371 


reachability set, called the coverability set [18]. This set is defined by introducing 
the cover relation over the markings, defined by x < y if x is less than or equal to 
y component-wise, i.e., on each place. The coverability set is then defined as the 
downward-closure of the reachability set. It provides a way to decide a variant 
of the reachability problem, called the coverability problem. This latter problem 
can be solved by computing what is called a basis of the coverability set. Its 
definition uses the notion of w-markings, an extension of markings that allows 
to mark places with a special symbol denoted by w, and interpreted as an infinite 
number. The well-quasi-order theory [II] shows that any downward-closed set of 
markings can be symbolically represented by a finite set of w-markings, called a 
basis. Moreover, this theory also proves that there exists a unique minimal one 
for the inclusion relation. 


'The computation of bases of coverability sets is exactly the purpose of the 
Karp-Miller algorithm introduced in [19]. This algorithm inductively computes 
trees where nodes are labeled by w-markings. When the algorithm stops, those 
labels form a basis of the coverability set. Karp-Miller algorithms (including 
all variants) are not optimal in worst-case complexity for deciding the cov- 
erability problem. In fact, those algorithms have an Ackermannian computa- 
tional complexity [8]25] while the coverability problem is known to be Expspace- 
complete [28]. There exist other algorithms, based on backward computations 
from the final marking, that are optimal in worst-case [4]21]. However, Karp- 
Miller algorithms outperform backward computation algorithms in practice (see 
[3] for benchmarks). Moreover, the computation of the coverability set bases 
provides ways to decide other properties than the coverability problem, like the 
termination and boundedness problems, as well as some liveness properties. It 
follows that this algorithm is central for analyzing Petri nets. 


Bases computed by the Karp-Miller algorithm are not minimal (for the in- 
clusion relation) since they may contain distinct w-markings x,y with zr < y. 
Naturally, the unique minimal basis of the coverability set can be computed 
by first invoking the Karp-Miller algorithm, and then applying a simple reduc- 
tion algorithm. However, such a computation is not optimal in practice since it 
requires computing several w-markings that will be discarded only at the end 
of the computation. A first attempt to avoid this problem was introduced by 
Alain Finkel in [9]. This algorithm is an optimization of the original Karp-Miller 
algorithm that seems very natural. However, a subtle problem when the compu- 
tation is performed on a very particular instance was discovered only 14 years 
later in [10]. Several authors tried to find patches for that bug by proposing 
various solutions [13[29]27[30]. Finally, in [12], an efficient algorithm removing 
on-the-fly useless basis elements was proved to be correct with a pen-and-paper 
proof. This algorithm, called MıNCov, is a state-of-the-art algorithm for com- 
puting the minimal basis of the coverability set. It can be seen as a variant of 
the Karp-Miller algorithm based on the new notions of abstractions and acceler- 
ations. Since algorithms a la Karp-Miller are prone to subtle bugs, formal proofs 
certified by proof assistants are called for. 
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Our Contributions. 


— We developed a complete formal proof in Coq of the correctness and termi- 
nation of the MINCov algorithm, via an intermediate algorithm called AB- 
STRACTMiNCOv. We follow the Coq formalization of Petri nets and mark- 
ings introduced in [83], built on top of the MATHEMATICAL COMPONENTS 
library (MaArHCOMP). This formalization contains several formal proofs 
and basic concepts related to Petri nets and markings that we extended to 
handle recent notions. Our proofs are based on this code to take benefits from 
those developments, but also to easily measure the gap between Coq formal 
proofs of two algorithms that compute coverability set bases: the original 
Karp-Miller algorithm and a state-of-the-art one. 

— We provide two new characterizations of the central notion of abstractions 
used by the MiNCoOv algorithm. A simple mathematical one, and an alge- 
braic one that shows that three operators on abstractions (weakening, con- 
traction, and acceleration) provide a complete set of rules for generating any 
abstraction starting from the Petri net transitions. The proof of this result 
is based on the Jančar well-quasi-order on executions [17122]. 

— We introduce as an intermediate algorithm a small-step variant of MINCov, 
called ABSTRACTMINCOV. We implemented in Coq proofs of the correct- 
ness and termination of ABSTRACTMINCOV. Since the original MINCov 
algorithm can be simulated by our algorithm, the proof that the original 
MINCOv algorithm is correct and terminates is obtained at the cost of a 
simple Coq proof. Compared to a direct proof, our approach provides more 
succinct proofs in CoQ, because proving that some properties are invariant 
is usually easier for a small step than for a big step. Additionally, our algo- 
rithm provides room for optimization by decorrelating some transformations 
performed by the original algorithm (this is discussed in the conclusion). 


Outline. Our Coq formalization of Petri nets, markings, and w-markings are 
given in Section [2] while the ones on abstractions and accelerations are given in 
Section [3] The CoQ modelization of MINCov is provided in Section [4] and our 
small-step algorithm ABSTRACTMINCOv is presented in Section [5] The code is 
available on Software Heritage [16]. 


2 Petri Nets 


A Petri net is a tuple P = (P, T, Pre, Post) where P, T' are two finite sets of ele- 
ments called respectively places and transitions, and Pre, Post are two mappings 
from T to NP. An element x € NP is called a marking. We denote by x(p) the 
value of z at the place p. Markings Pre(t) and Post(t), where t is a transition in 
T are called respectively the precondition and the postcondition of t. 

We follow the Coq formalization of Petri nets and markings introduced in 
. That formalization was introduced to prove the correctness and termination 
of the original Karp-Miller algorithm. This formalization is built on top of the 
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MATHEMATICAL COMPONENTS library [I4] (MATHCompP). This library provides 
finite types (see the CoQ keyword finType below) that provides a useful type 
for Petri net places and transitions, but also functions with finite domain (see 
ffun). Markings are conveniently represented by these functions. More precisely, 
in our COQ proofs, Petri nets and markings are defined as follows. 


Record petri net :- 
PetriNet 
Í place transition : finType; 


: transition -> {ffun place -> nat}; (* pre, post *) 


Definition marking (pn : petri net) :- {ffun place pn -> nat} 

(* Re-type the 3rd and 4th fields of PN to use the name "marking". *) 

Definition pre (pn : petri net) : transition pn -> marking pn := 
let: PetriNet | _ p . :- pn in p. 

Definition post (pn : petri net) : transition pn -> marking pn :- 


let: PetriNet | _ _ p :- pn in p. 


Now, let us provide some elements of Petri net semantics. Given a Petri 
net P, a transition t € T is said to be fireable from a marking z if Pre(t) < z; 
where < is the component-wise extension of the usual order € on N, i.e. x X m 


iff z(p) < m(p) for every place p € P. In that case we write x — y where 
y = x — Pre(t) + Post(t) is called the marking obtained after firing t from z. 
We extend the notion of fireability to a sequence o = t,...tx of transitions 
ty,...,tk € T by z & y if there exists a sequence z9,..., £% of markings such 


that rg = r, £k = y and rji EN x; for every 1 < j < k. In that case, we say 
that c is fireable from x and y is naturally called the marking obtained after 
firing o from x. When such a sequence o exists, we say that y is reachable from 
x (for the Petri net P). 

The Petri net reachability problem consists in deciding, given a Petri net P 
and two markings x,y, whether y is reachable from x. The reachability prob- 
lem is Ackermann-complete [5[23]6[20| and algorithms deciding the problem are 
complex [24]. However, this high lower bound is not always a barrier in practice 
since many problems related to Petri nets can be decided by computing an over- 
approximation of the reachability property, called the coverability, obtained by 
introducing the downward-closed sets. 

More formally, the downward closure of a set M of markings is defined as 
the set {x € NP | dy € M, x € y). We say that M is downward-closed if it is 
equal to its downward closure. Downward-closed sets can be finitely represented 
by introducing the notion of w-markings, a notion also known as the ideal repre- 
sentation of downward-closed sets (see [II] for extra results). We first introduce 
the set N, defined as NU {w}, where w is a special symbol not in N that is 
interpreted as an infinite number. This interpretation is defined by extending 
the total order € over N into a total order on N,, by n < w for every n € Nu. An 
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w-marking is an element of z € NP. In [33] and in our CoQ proofs, w-markings 
are defined with the type markingc as follows. 


Definition natc :- optiontop nat. 
(* Here None (also denoted Top) denotes Ümega and Some m denotes m *) 


Definition markingc :- {ffun place -> natc}. 


We associate with an w-marking x the downward-closed set |: of markings 
defined as (y € NP | y < x}. We also denote by |B, where B is a finite set 
of w-markings, the downward-closed set (pep 4x. Let us recall from the well- 
quasi-order theory [II] that any downward-closed set M of markings admits a 
finite set B of w-markings, called a basis of M, such that M = |B. Bases provide 
finite descriptions of downward-closed sets. Naturally a downward-closed set can 
have several bases. However, among all the bases of a downward-closed set, the 
unique minimal one (for the inclusion relation) can be computed from any basis 
as follows. We say that a finite set B of w-markings forms an antichain if for 
every x,y € B such that x < y, we have x = y. Notice that if B is a basis of 
a downward-closed set M that is not an antichain, then there exist z,y € B 
such that x < y. Since in that case B\{x} is also a basis of M, it follows that 
by recursively removing from B the w-markings that are strictly smaller than 
another one in B, we derive from any basis another one that is an antichain. One 
can prove that this antichain is the unique minimal basis of M (for the inclusion 
relation). 

Given a Petri net P, we say that a marking z € N? is coverable from a 
marking zo if there exists a marking y > z reachable from xo. The set of coverable 
markings is called the coverability set. 

Since coverability sets are downward-closed, they can be described by bases. 
'The computation of such those bases is exactly the purpose of Karp-Miller al- 
gorithms. While w components were introduced in the original Karp-Miller al- 
gorithm with some algorithmic techniques, this notion was abstracted away 
in [12] as kind of meta-transitions, called accelerations and abstractions. Those 
notions are recalled in the next section. They are used to compute the minimal 
basis of the coverability set, called the clover in [12]. In our Coq proofs, we en- 
code the clover as a list of w-markings (a list is denoted by seq). The definition 
uses the coverable predicate defined in [33]. 


Definition clover (mO : marking) (1 : seq markingc) := 
antichain 1 /\ 
forall m : marking, 


coverable m0 m <-> exists mc : markingc, (mc Min 1) && (m Min mc). 


(* perm. eq is the list equivalence modulo permutation *) 
Theorem clover. unique mO (11 12: seq markingc): 


clover m0 11 -> clover mO 12 -> perm eq 11 12. 
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3 Abstractions and Accelerations 


Abstractions provide a simple way to explain why some markings can be cov- 
ered from other ones. In this section we first recall the definition and semantics 
of w-transitions. Then we introduce the abstractions following the definition 
introduced in [12], based on w-transitions. We show that this rather technical 
definition is in fact equivalent to a new simpler one. Whereas the proof of equiva- 
lence between the two definitions is simple, we think that our definition provides 
interesting intuitions on abstractions. Finally, in the last part of this section we 
show that three operators on abstractions (weakening, contraction, and acceler- 
ation) provides a complete set of rules for generating any abstraction starting 
from the Petri net transitions. The proof is based on the Janéar well-quasi-order 
on executions [17122]. 

Since our Coq proofs for this part are obtained by series of case analyses 
(not complicated but lengthy in Coq), we do not provide additional information 
concerning that part of our implementation. All proofs can be found in the file 
New_transitions.v. 


3.1 w-Transitions 


An w-transition t is a pair t = (x,y) where z,y € NË are w-markings such that 
x(p) = w => y(p) = w for every place p € P. The w-markings x and y are 
respectively denoted by Pre(t) and Post(t) and they are called respectively the 
precondition and the postcondition of t. This notation provides a natural way to 
identify transitions of a Petri net as particular w-transitions. We implemented w- 
transitions in COQ with the dependent datatype omega, transition as follows. 


Definition transitionc := (markingc * markingc)/type 


(* t.pre = Pre(t) and t.post = Post(t) *) 
Definition inv_omega_transition (t: transitionc) := 


[forall p , (t.pre p == None ) ==> (t.post p == None)]. 


Definition omega_transition := {+t | inv_omega_transition t Ys 


We introduce the operator © : NP x NEL — NP defined component-wise by 
croy=lifa<y,wifr=wandyeEN, and zx — y otherwise. As expected, an 
w-transition t is said to be fireable from an w-marking z if Pre(t) € x. In that 
case, we write z +> y where y = (x © Pre(t)) + Post(t) is called the w-marking 
obtained after firing t from z. 

In order to provide a way to manipulate a sequence of w-transitions as just one 
single w-transition, the notion of Hurdle [15], known by the Petri net community 
for sequences of transitions, was extended to sequences of w-transitions [12]. More 
formally, we introduce an internal binary operator & on w-transitions, called the 
contraction, as follows: 


s&t = ( (Pre(t) S Post(s)) + Pre(s) , (Post(s) © Pre(t)) + Post(t) ) 
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We implemented in CoQ the contraction operator and we formally proved 
the following lemma. 


Lemma 1. For every w-markings x,z € NL, the w-transition s & t satisfies: 


et = mE: 
yp. => FWe!EN, coy Sz 


In the sequel, given a sequence of w-transitions 9. = f4...ty, we call the w- 
transition t = t1 ®--- G ty the contraction of c and, when there is no ambiguity, 
we identify ø with its contraction. It follows that Pre(c) and Post(c) are well 
defined. 


3.2 Abstractions 


Following [12], an abstraction is an w-transition a such that for all n > 0, there 
exists on € T* such that for all p € P with Pre(a)(p) € N: 


— Pre(on)(p) € Pre(a)(p) 
— If Post(a)(p) € N then Post(a)(p) + Pre(on)(p) € Post(o,)(p) + Pre(a)(p) 
— If Post(a)(p) = w then Pre(o,,)(p) +n € Post(o,)(p) 


Our CoQ implementation of abstractions is a direct translation of the previous 
definition. We provide the code just below. In that code, note that seq_to_one 
is a function that maps sequences of transitions to their contractions. Also, we 
provide a simplification of the actual code in which we use the same symbols 
for comparisons and operations independently of whether nat, natc, or a mix 
of the two, are used. Similarly, we assume in the sequel implicit coercions from 
omega_transition, abstraction, or acceleration to transitionc. 


Definition inv_abstraction_aux (t : transitionc) (y : marking*marking) 
(p : place) (n : nat) := 
mem nc (t.pre p) (y.pre p) 
/N (t.post p != None -> t.post p + y.pre p <= t.pre p + y.post p) 
/N (t.post p == None -> y.pre p + n <= y.post p). 


Definition inv abstraction (t : transitionc) :- 
forall (n : nat), exists (o_n : seq transition), forall (p : place), 


t.pre p != None -> (inv abstraction aux t (seq to one o n) p n). 


Definition abstraction := { a : omega transition | inv abstraction a }. 


The previous definition of abstraction is in fact equivalent to the following 
simpler one, where Cover(z, P) for some w-marking x denotes the set of markings 
z such that x 5 y for some word ø of transitions and some w-marking y > z. 


Lemma 2. A given w-transition a is an abstraction if, and only if, it satisfies 


| Post(a) € Cover(Pre(a),P). 
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Note that this new characterization provides a way to constructively check 
whether an w-transition is an abstraction. This would allow us to declare ab- 
stractions as an eqType in a future work. 


We also recall the following lemma proved in [12]. This result is central for 
the correctness of the algorithm MINCov. We implemented its proof in COQ in 
the file New_transitions.v. 


Lemma 3 (Lemma 1 in [12]). Let zo be a marking of a Petri net P. For 
every w-markings x,y such that x “> y for some abstraction a, we have: 


{x € Cover(zg,P) = ly € Cover(zo, P) 


3.3  Abstraction Builder 


In this last part, we show that any abstraction can be built from Petri net tran- 
sitions by applying three operators: weakening, contraction, and acceleration. 


Let us first start with the simplest operator, called the weakening. We intro- 
duce a partial order C on the w-transitions defined by s C t if Pre(t) < Pre(s) 
and Post(s) +Pre(t) € Post(t) 4- Pre(s). The second inequality intuitively means 
that the effect of t is larger than or equal to the effect of s (component-wise). 
Based on Lemma B] we deduce that if t is an abstraction and s an w-transition 
such that s E t, then s is also an abstraction. Based on this observation, we 
introduce a weakening operator that just replaces an abstraction t by any other 
abstraction s E t. 

'The second simplest operator is the contraction. Based on Lemmas [1] and 
we can deduce that if s, t are two abstractions, then s & t is also an abstraction. 

The last operator, called the acceleration, associates with an w-transition t 
the w-transition t^ that intuitively corresponds to the infinite firing of t. More 
formally, t^ is defined as follows for every place p € P: 


f: if Pre(t)(p) > Post(t)(p) 


Pee Jp) = Pre(t)(p) otherwise 
w if Pre(t)(p) # Post(t)(p) 
Post(t)(p) otherwise 


Post(t*)(p) = 


In [12], it is proved that if a is an abstraction then a” is also an abstraction. 


Notice that t^ = t if, and only if, Post(t)(p) € {Pre(t)(p),w} for every p € 
P. If a is an abstraction and a” = a, we say that a is an acceleration. Since 
accelerations play a central role in the MıNCov algorithm, we implemented 
them in CoQ as follows. 


Definition inv accel (t : transitionc) :- 


[forall p, (t.post p == None) || (t.post p == t.pre p)l. 


Definition acceleration :- { a : abstraction | inv accel a Jj. 
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The following Lemma |4| is one of the main result of this section. It shows 
that any abstraction can be derived from the Petri net transitions by applying 
the previously mentioned operators. 


Lemma 4. An w-transition a is an abstraction if, and only if, there exist wo, 
ti, W1,..., tg, wy where wo,...,wy € T* and t,...,ty € T such that: 


a E wotiwt ...tywy 


4 The Original MINCov Algorithm 


In this section, we present our COQ implementation of the MINCov algorithm. 
We tried to be as close as possible to the algorithm introduced in [12], to provide 
convincing evidence that it is correct and terminating. We however omitted the 
trunc function used in the MINCov pseudocode presented in [12] but not in 
their PYTHON implementation. In practice this function differs from the identity 
function only when numbers computed by the algorithm are larger than the 
number of atoms in the universe. 


4.1 Explicit Coverability Trees 


As already mentioned, this algorithm computes the minimal basis of the cover- 
ability set of a Petri net P from an initial w-marking x. Similarly to the original 
Karp-Miller algorithm, it computes inductively a tree 7 such that nodes are la- 
beled by w-markings, and edges by transitions. In the case of MINCov, the 
constructed tree, called an explicit coverability tree, contains additional labels 
that are explained a bit later. We implement explicit coverability trees in CoQ 
as the following inductive definition KMTE: 


Inductive KMTE :- | Empty E 
| Br_E of markingc & 
(seq acceleration) & 
bool & 
{ffun transition -> KMTE}. 


A node obtained with the constructor Empty_E is called empty, whereas a 
node obtained with the constructor Br_E is called valid. The first line of the 
constructor Br. E of a valid node N provides the w-marking denoted by A(N) 
that labels the node N. The fourth line provides a function that inductively 
maps each transition t to a subtree. The root node of that subtree is denoted by 
N.t and called the child of N following t. Given a node, we call the unique word 
o € T* that labels the edges of the tree from the root to that node the address 
of that node. A word c € T* is called a valid address if it is the address of a 
valid node. This node is denoted by No in that case. A node is called a leaf if it 
is valid and if N.t is an empty node for every transition t. 
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Compared to trees computed by the Karp-Miller algorithm, explicit cover- 
ability trees computed by the MINCov algorithm have two additional pieces of 
information on each valid node, provided by the second and third lines of the 
constructor Br, E. First of all, since trees may be partially destroyed when a sub- 
tree corresponding to redundant computations is detected, the computation is 
no longer a DFS exploration. In order to keep track of nodes that are waiting for 
further exploration, called front nodes, each valid node is marked with a boolean 
flag that is assigned to true when it is a front one. The set of front nodes of an 
explicit coverability tree 7 is denoted by Front(7 ). Last but not least, explicit 
coverability trees contain additional information to recover the way the node 
labels were generated. To do so, the second line of the constructor Br E of a 
valid node N provides a sequence a4 ... aj, of accelerations denoted by p(N). 

In our implementation, we prove that the following properties (called invari- 
ant properties in the sequel) are maintained throughout any execution of the 
algorithm. 


Front nodes are always leaves (predicate Front. leaves). 
Non-front node labels form an antichain (predicate Not. Front. Antichain). 


Ne ; 
— The root node is valid, and zo Hm. A(N-) (predicate consistentE head). 
— If a valid node N is not the root, i.e. N = N'.t for some node N’ and some 


transition t, then A(N’) 2 A(N) (predicate consistentE tree). 


4.2 Step Relation 


The MiNCoOv algorithm is a while loop algorithm that updates a pair (7 , A), 
where 7 is an explicit coverability tree, and A is a (finite) sequence of accel- 
erations. Accelerations that occur in 7 (in the u labeling) are taken from A. 
Moreover, the sequence A can only grow with new discovered accelerations. Ini- 
tially, the MINCov algorithm begins with the pair (7, A) where A is the empty 
sequence € and 7 is the explicit coverability tree reduced to a single valid front 
node N, labeled by A(Nz) = xg and u(Nz) = e. The algorithm picks nondeter- 
ministically a front node at each iteration of the while loop to transform the tree. 
It terminates when the set of front nodes is empty and, at that point, returns the 
current 7 (the set A is discarded at the end). Our CoQ implementation of this 
algorithm is defined by introducing a binary relation Rel on those pairs (7 , A). 
Such a one-step encoding provides all the possible nondeterministic behaviors of 
the algorithm. It follows that our proofs of correctness and termination are valid 
whatever the implemented particular exploration heuristic. 

Formally, the relation Rel is defined as follows, with three constructors 
Rel. clean, Rel. accel, and Rel, explo that are defined later in this section: 


Variant Rel : 

(KMTE * seq acceleration) -> (KMTE * seq acceleration) -> Prop := 
| Rel clean [...] (* cleaning operation *) 
| Rel accel [...] (* accelerating operation *) 


| Rel explo [...] (* ezploring operation *). 
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As will be discussed later, the termination of the MINCov algorithm is 
proved by certifying that the relation Rel is well-founded. For that reason, 
Rel (T',A') (T,A) corresponds to a step of the MiNCoOv algorithm from (7, A) 
to (77, A’), and not the other way around. 

One central notion of the algorithm is the definition of saturated w-markings. 
An w-marking x is saturated for a sequence A of accelerations if, for every accel- 
eration a € A such that z y for some w-marking y, we have x = y. When an 
w-marking is not saturated for a sequence A, it can be saturated with respect to 
A as follows. Note that in general, given two w-markings x,y such that z 5 y 
for some acceleration a, then y(p) € (x(p),w) for every place p. It means that y 
is obtained from x by setting to w some places of x. In particular, if x Z y, then 
the number of places with natural numbers is strictly decreasing from z to y. 
It follows that an algorithm that tries to apply in a round-robin fashion all the 
accelerations in A eventually terminates on a fixed point in at most |P| rounds. 
We implement this algorithm in CoQ with a function saturate KMTree A T ad 
that takes as input a sequence A of accelerations, an explicit coverability tree 7 , 
and a valid address ø € T* (denoted by ad), and returns the explicit coverability 
tree obtained from 7 by saturating A(N4) with respect to A, and by append- 
ing to u(.N,) the sequence of accelerations used by the round-robin saturation 
algorithm. 

The MINCov algorithm is implemented in such a way the labels of the non- 
front valid nodes form an antichain. To enforce that property, the cleaning op- 
eration takes as input two explicit coverability trees 7 and 7', a sequence A 
of accelerations, and an address ø (denoted by ad below), and checks if o is 
the address of a front node, if J’ is the tree obtained from 7 by saturating No 
with respect to A (see above), and if there exists a non-front node N' such that 
(No) € A(N’) in T’ (predicate ad. covered. not front T' ad below). In that 
case, the cleaning operation puts in the relation Rel the pair (7 , A) with (7”, A), 
where 7” is obtained from 7” by removing the node at address c (implemented 
by removeE add T' ad). 


Rel clean (T:KMTE) A ad T': Is Front T ad 
-> T'= saturate KMTree A T ad 
-» ad. covered not front T' ad 
-» Rel (removeE add T' ad, A) (T,A) 


When the previous cleaning operation cannot be applied on a front node 
with address ø (^^ denotes the negation, and ad and ad' in the code refer 
to o and o^), the algorithm checks if this front node, once saturated, is la- 
beled by an w-marking larger than the label of an ancestor with address ø’ 
(through the predicate Possible, acceleration, which also checks that o" is 
the prefix of co). If so, an accelerating operation is performed. It consists first in 
computing the acceleration corresponding to the path between the two nodes. 
More precisely, computingE acceleration T' ad' ad computes the accelera- 
tion a = (t1e1...ty05)^, where o = o'ti ...ti for a sequence ti ...t, of transi- 
tions, and 01,...,0; are the sequences of accelerations that occur in J’ from o 
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to c^, ie. oj = (Norn ...t;); In that case, the accelerating operation puts in the 
relation Rel the pair (T, A) with (7, A’), where A’ is the sequence obtained by 
adding a to A, and 7" is obtained from 7’ by removing the subtree of 7^ from 
No and by setting that node as a front node (to. FrontE T' ad below). 


Rel accel (T:KMTE) A ad T' ad' a: Is Front T ad 
-» T'= saturate KMTree A T ad 
-> ^^ ad covered not front T' ad 
-» Possible acceleration T' ad' ad 
-» a - computingE acceleration T' ad' ad 
-» Rel (to FrontE T' ad', a :: A) (T,A) 


When the previous cleaning and accelerating operations cannot be applied on 
a front node (tested through No. Possible. acc for the accelerating operation), 
the algorithm performs an exploration from that front node by trying to fire all 
the transitions from the label of that node. This label x is computed after sat- 
uration via the function m. from. add, from the tree and the address ø (denoted 
by ad below) of the node. The exploring operation (see Rel, explo below) puts 
in the relation Rel the pair (7, A) with (7"", A), where 7" is the tree obtained 
from 7” by removing valid nodes labeled by an w-marking smaller than x (im- 
plemented by removeE strict covered T' x), and 7"" is obtained from 7” 
by removing the node at address c from the front list, and by creating, for each 
transition t such that there exists an w-marking y such that x 4 y, a front node 
No labeled by A(Not) = y and u(Not) = & (this last operation is implemented 
by Front, extensionE). 


Rel explo (T:KMTE) A ad T' mc: Is Front T ad 
-» T'= saturate KMTree A T ad 
-> ^^ ad. covered not front T' ad 
-> No Possible. acc T' ad 
-» Some mc - m from add T' ad 


-> Rel (Front extensionE (removeE strict covered T' mc) ad, A) (T,A) 


5 The ABSTRACTMiNCoOvV Algorithm 


The COQ proofs of correctness and termination of the MiNCOv algorithm are 
obtained by introducing a variant of that algorithm, called ABSTRACTMINCov. 
'This new algorithm takes a small-step approach obtained by decomposing the 
three main operations (cleaning, accelerating, and exploring) of the original MIN- 
Cov into sequences of five small-step operations presented in this section. 

We implemented in Coq a formalization of ABSTRACTMINCOV and proved 
the correctness and termination of that algorithm. Since the original MINCOov 
algorithm can be simulated by our algorithm, we obtain at the cost of a simple 
Cog proof of simulation that the original MINCov algorithm is correct and 
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terminates. Compared to a direct proof, our approach provides more succinct 
proofs in COQ, because proving that some properties are invariant is usually 
easier for a small step than for a big step. 

Compared to the original MINCov algorithm, which performs the three main 
operations in a strict order, the five operations of ABSTRACTMINCov can be 
executed in any order. It follows that new exploration heuristics, for instance 
the early discarding of subtrees after the discovering of an acceleration, can be 
implemented without rewriting any proof of correctness or termination. 

In Section we introduce the (implicit) coverability trees, the central data 
structure of the ABSTRACTMINCOV algorithm. In Section we present the 
five operations of the ABSTRACTMINCOvV algorithm. Finally, in Section we 
provide some elements of our termination and correctness COQ proofs. 


5.1 Coverability Trees 


We implement the (implicit) coverability trees in COQ as the following inductive 
definition KMTree: 


Inductive KMTree :- | Empty 
| Br of markingc & 
bool & 


{ffun transition -> KMTree}. 


As one can see, they are nearly the same as explicit coverability trees: we 
just remove the sequence of accelerations that was previously part of the label 
of a node. The invariant properties introduced for explicit coverability trees (see 
the end of Section have straightforward counterparts for the coverability 
trees, which are similarly maintained throughout any execution of ABSTRACT- 
MINCov. 


5.2 The Algorithm 


ABSTRACTMINCOV also consists of a main while loop that updates a pair 
(T,A), where 7 is a coverability tree instead of an explicit one, and A a fi- 
nite sequence of accelerations. Initially, the ABSTRACTMINCOvV algorithm be- 
gins with the pair (7, A) where A is the empty sequence & and 7 is the cov- 
erability tree reduced to a single valid front node Nz labeled by (Nz) = zo. 
This tree is built by the Coq function KMTree, init. Then, at each round of 
the loop, it picks one of the five operations it can apply on the pair, the one 
whose precondition is met, and apply it. It terminates when none of the opera- 
tions have preconditions satisfied by the pair (7, A). At the end, A is discarded 
and only 7 is returned. As ABSTRACTMINCOv is nondeterministic, we imple- 
ment it as a relation, like we do for MINCov. More precisely, we implement it 
in CoQ as a binary relation Rel small step on those pairs (7, A) such that 
Rel small step (T',A') (T,A) corresponds to a step of ABSTRACTMINCOV 
from (7, A) to (77, A’). Hence all possible executions of ABSTRACTMINCOV 
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are encoded into decreasing sequences of Rel small, step. Hence, by proving 
its well-foundedness and its correctness, we prove that every execution of the 
ABSTRACTMINCOv algorithm is correct and terminates. 


Variant Rel. small. step : 
(KMTree * seq acceleration) -> (KMTree * seq acceleration) -> Prop := 
Rel small, step sat [...] (* saturating operation *) 
Rel small. step cln (* cleaning operation *) 

(* accelerating operation *) 


Rel. small. step. cov 


| 

| 

| Rel. small. step. acc 
| (* covering operation *) 
| 


aS Tm 
La L3 L3 L3 


Rel small. step. exp (* exploring operation *). 

In the file MinCov.v, operations of MINCovV are proved to be simulated by 
sequences of AbstractMinCov operations matching the following regular expres- 
sions (for readability, the prefixes Rel_ and Rel_small_step_ are removed): 


clean C sat* cln accel Csat* acc explo C sat" cov" exp 


In MINCov, accelerations are added to the set A only during the accelerat- 
ing operation, and the added acceleration comes from the considered branch of 
the tree. On the contrary, the five operations of ABSTRACTMINCOV allow new 
accelerations to be added to A. Such accelerations could be computed from the 
tree like in MINCov, but they could also be discovered by running an external 
heuristic algorithm for example. 


The saturating operation is a small-step version of the already seen function 
saturate KMTree, applying only one acceleration at a time instead of applying 
as many accelerations as possible. It can be performed on any front node N of 
label x and address ad such that x & y (i.e. y = apply transitionc x a) and 
x Æ y, for some a € A and some w-marking y. The saturating operation simply 
sets A(N) to y (which is what the function saturate a little a T ad does). 


Rel. small. step sat T A A' ad mc (a:acceleration) mc': Is Front T ad 
-» List.Ina A 
-> Some mc = m from add T ad 
-» Some mc' - apply transitionc mc a 
-» mc !- mc' 
-> Rel small. step (saturate a little a T ad, A'++A) (T,A) 


The cleaning operation is basically the same as the one of MiNCOv. The dif- 
ference is that now the w-marking of the considered node is required to be already 
saturated (which can be obtained via the Rel. small step. sat operation). Also 
note that the removeE, add function has been replaced by the remove. add func- 
tion (with the same behavior) because of the change from KMTE to KMTree. This 
is also the case for several other functions in the other operations. 


384 T. Hilaire, D. Ilcinkas, and J. Leroux 


Rel.small.step cln T A A' ad: Is Front T ad 
-> gaturated node A T ad 
-> ad. covered not front T ad 
-> Rel small. step (remove add T ad, A'++A) (T,A) 


The accelerating operation is abstracted compared to the MINCOV equivalent 
operation. More precisely, the acceleration used to justify the cut of the branch 
via the to_Front function may come from previous stages of the algorithm, or 
be guessed during the operation. In the latter case, the acceleration may be 
computed as in MINCov. It follows that subtrees rooted in non-saturated nodes 
can be discarded earlier than in MINCov. 


Rel.small.step acc T A A' ad mc : ~~ Is Front T ad 
-> Some mc = m. from add T ad 
-» ^^ (saturated markingc mc (A'++A)) 
-> Rel small. step (to Front T ad, A'++A) (T,A) 


The covering operation removes a node of 7 when it is covered by a node in 
Front(7 ). It corresponds to a part of the exploring operation of MINCov. The 
non-prefix requirement is here to ensure that a front node does not trigger its 
own deletion. 


Rel. small. step cov T A A' ad mc ad' mc': Is Front T ad 
-» Some mc - m from add T ad 
-» Some mc' - m from add T ad' 
-» mc' «- mc 


-> ^^ prefix ad' ad 


-> Rel small. step (remove add T ad', A'++A) (T,A) 


The exploring operation is an abstracted version of the one in MINCov. 
It only performs the extension of some front node N without any additional 
transformation. However, stronger requirements are needed. Namely, N must be 
already saturated (this can be obtained thanks to the saturating operation), and 
the non-front nodes must satisfy the Not. Front Antichain property once the 
front flag of N is switched to false (this can be obtained thanks to the covering 
operation). 


Rel. small. step exp T A A' ad: Is Front T ad 
-» gaturated node A T ad 
-» Not Front Antichain (remove Front T ad) 


-> Rel small. step (Front extension T ad, A'++A) (T,A). 


5.3 Certification 


Termination proofs of Karp-Miller algorithms are usually based on the fact that 
< is a well-quasi-order over the set of w-markings. As in [83], we replace this 
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classical notion with the notion of almost-full relation [32]. This order is however 
just an ingredient and further arguments are needed. This is especially true for 
MINCov, because the tree maintained in this algorithm may not only grow, as 
in the original Karp-Miller algorithm, but also shrink. The code can be found 
in the file Termination.v, including the following theorem, where Acc is the 
predicate of the Coq standard library used in the constructive definition of 
well-foundedness. 


Theorem wf Rel small step: forall (T : KMTree) (A : seq acceleration), 
Front leaves T -> 
Not Front Antichain T -> 
Acc Rel small step (T,A). 


This theorem is proved thanks to a general well-founded rewriting relation 
on trees described in the file wor_tree.v. 


Our correctness proof in COQ is close to the pen-and-paper one of MIN- 
Cov [12]. Whereas the correctness proof of the original Karp-Miller algorithm 
is based on branches, operations on trees performed by MINCov depend on the 
complete tree. The correctness proof can be found in the file Correctness.v, 
whose main theorem is the following one, where clos refl trans. 1n is the 
predicate for the reflexive and transitive closure, and Markings. of T computes 
the list of all w-markings of the input coverability tree. 


Theorem Correctness T A (m0: marking): 

. Rel.small step (T,A) (KMTree init m0) -> 
(forall T' A', ^ Rel small step (T',A') (T,A)) -> 

clover mO (Markings.of T T). 


clos refl trans ín 


As in [12], this theorem is a corollary of two results, corresponding to the 
two directions of the equivalence in the clover definition. 

'The main theorem of the file KMTrees.v, shown below, provides the first 
direction by observing that the desired implication follows from the consistent 
properties mentioned in Sections 4.1]|and The fact that these properties are 
invariant (proved in file AbstractMinCov.v) implies that this implication is in 
fact satisfied throughout the execution and not just when the algorithm has 
terminated. 


Theorem cover consistent KMTree A mO T: 
consistent tree A T -> 
consistent head A mO T -> 
forall (mc: markingc) m, 
mc Min Markings. of T T -> 
m Min mc -> 


coverable m0 m. 


The other direction is the main theorem of file Completeness.v. 
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Theorem Rel, small step all covered T A (m0: marking): 
clos refl trans in _ Rel small step (T,A) (KMTree init m0) -> 
(forall T' A', ^ Rel small step (T',A') (T,A)) -> 
forall m, coverable mO m -> exists (mc:markingc), 
mc Min Markings_of_T T /\ 


m Min mc. 


The following table summarizes the size of [33|'s and our formalizations. We 
import and use all files from except the Karp-Miller part. 


Technical tools 631 lines 

[33] (commit bbb0668) Petri net 1226 lines 
Karp-Miller 775 lines 

Technical tools 1790 lines 

[This paper] Petri net extension 1869 lines 
MINCov and ABSTRACTMINCOV 55590 lines 


6 Conclusion 


We provide a complete CoQ certification of MINCov, an algorithm that com- 
putes the minimal basis of the coverability set (of a Petri net with an initial 
marking). Our development is obtained by introducing a small-step variant of 
that algorithm, called ABSTRACTMINCov. This variant consists of smaller and 
more abstract steps than in MINCOV, and which can be performed in any order. 
'This gives a lot of freedom to an actual implementation of the algorithm, leav- 
ing room for heuristics. In particular, the step Rel, small step acc can prune 
any subtree rooted on a non-saturated node. Note that such a subtree is nec- 
essarily removed at some step of the MiNCoOv algorithm, since every node is 
saturated when the algorithm terminates. This early removal will decrease the 
total number of node comparisons that are performed by operations maintain- 
ing the antichain property (Rel. small. step. cln and Rel, small, step. cov). It 
would be interesting to quantify the actual impact of such a strategy, and more 
generally, of all the heuristics permitted by our ABSTRACTMINCOv algorithm. 

The constructive logic of COQ provides automatic correct-by-construction 
OCAML code extraction. This is however not currently possible because we use 
relations to describe the algorithms in order to preserve their non-determinism. It 
should be interesting in a future work to implement choice functions and boolean 
versions of our Prop predicates, and to benchmark the extracted code against 
the existing PYTHON implementation of MiNCOv. Since most of our predicates 
are already boolean functions (although their boolean natures are hidden by a 
coercion), we think that obtaining an OCaml extraction would be reasonably 
easy. However, obtaining an efficient one would require a significant additional 
amount of work. 
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